This month we will be ramping up our coverage of DataOps and data engineering, so we can all learn how to streamline data operations. View in browser »

ISSUE 245: Automating the Data Pipeline

“Open source will gradually take over and become the standard way to do security.”

___
Loris Degioanni, the CTO and founder of Sysdig.

FaaS Adoption of Organizations Using Serverless via Hosted Platform or Installable Software

Despite proclamations from Amazon.com Chief Technology Officer Werner Vogels that usage of Amazon Web Services’ Lambda usage has risen dramatically, recent surveys of Kubernetes communities have found relative declines in the adoption of the serverless service. The number of cloud native organizations continues to grow, they are just slightly less likely to include the AWS serverless compute technology in their plans.

Two-thirds of the Cloud Native Computing Foundation’s (CNCF) June 2020 survey of its Kubernetes-centric community is using or considering using serverless technology. But, strip away the evaluations and theoretical plans and only 30% are using serverless in production, a far cry from the 92% of respondents with containers deployed in production.

Overall, 46% use at least one hosted platform (e.g., AWS Lambda, Google Cloud Functions) or installable software (e.g., Knative, OpenFaaS). That is exactly the same level as in the 2019 survey according to our calculations. AWS Lambda adoption dropped from 67% in 2019 to 55% in 2020 as organizations evaluating serverless are somewhat less likely to have it on their roadmaps. Meanwhile, Microsoft’s Azure Functions made significant progress, going from 16% to 24% year over year.

Following VMware’s purchase of SaltStack earlier this year, SaltStack’s configuration management portfolio has been merged with VMware’s suite of offerings.

SaltStack’s Salt is a leading automation and security platform for configuration management for on-premises and cloud native environments. Created with Python, Salt is in use among Juniper, Cisco, Cloudflare, Nutanix, SUSE and Tieto, as well as a number of other Fortune 500 technology companies, as well as banks. SaltStack also offers a suite of tools, including SaltStack Enterprise for Salt, Plugin Oriented Programming (POP) and Tiamat.

In this episode of The New Stack Makers podcast, Salt creator Thomas Hatch, who is also the founder and chief technology officer of SaltStack, and Janae Andrus, community manager for Salt at VMware, discuss SaltStack’s roots, evolution and integration with VMware platforms and technologies. The future of SaltStack’s open source projects was also discussed.

What Happens to SaltStack Now Under VMware

Automating the Data Pipeline

This month we will be ramping up our coverage of DataOps and data engineering, so we can all learn how to streamline data operations in much the same way code development and infrastructure management are also being automated.

In the field of software development, we have tools like git to allow for distributed collaboration. And with GitOps, we can automatically push that code into production. On the operations side, we have the emergence of programmable infrastructure, that — with tools from HashiCorp, Pulumi and others — automates the process of rolling out deployments without manual intervention.

But while the processes around code and computers are rapidly becoming automated, the management of data (or “Big Data” if you have a lot of data) still seems stuck in the 20^th century. In a contributed post to The New Stack this week, Lenses.io co-founder and Chief Technology Officer Andrew Stevenson writes that “Businesses in every industry are data driven, and data professionals are feeling increasing pressure to work more efficiently and accelerate time to market for their products. The last thing a data professional wants is to become a bottleneck in the process.”

The good news is that help is on the way. Stevenson argues for instituting a form of DataOps, in which analysts, developers and other business-focused users can work with the data in a self-service fashion. A good DataOps system — Lenses.io offers a DataOps platform itself — should also accommodate corporate compliance and governance needs as well.

Other companies are tackling this issue as well.

The Irish startup TerminusDB is just one of the projects working on creating a “git for data” system, so we learn from Susan Hall this week. TerminusDB is an open source in-memory graph database that allows different people to work on different versions of the same project at the same time.

For machine learning (ML) data, we learned from KubeCon that Microsoft has been pitching the idea of MLOps, or a complete pipeline for managing the data from training to the model stage itself. This automates the management of data and, not incidentally, provides a baseline for security, “The truths you can’t avoid here: your models will be attacked, your pipelines will have issues. And the game is all about mitigation of harms and quick recovery. And you can do that using an MLOps pipeline,” Microsoft’s David Aronchick said.

Also at KubeCon, we got a preview of a new concept, called a “Feature Store,” which was also created to automate the data pipeline. A collaboration between Google and Indonesian startup Gojek, called Feast, provides a way for companies to organize commonly used data sets and formats so they can be easily accessed from developers and data scientists (and more easily managed in a uniform way behind the scenes).

“A typical flow is for data scientists to either push data into a feature store for storage, or to register transformations with a feature store that will generate data to be stored within the feature store. Once the data is available within the feature store, another team can consume those features for training a model, and can also retrieve features from the feature store for online serving,” Feast creator Willem Pienaar explained to TNS writer Kimberly Mok.

It's a rapidly evolving field, and like GitOps or programmable infrastructure, one that demands automation. In an excellent QCon talk on data engineering a year ago, software engineer Chris Riccomini passed on an insightful quote from a Google SRE, namely that “If a human operator needs to touch your system during normal operations, you have a bug."

The future is automation — for the code, infrastructure AND data.

KubeCon+CloudNativeCon: Service Mesh Battle Stories and Fixes

As more organizations implement service meshes, they are finding what works and what needs more work, and they are creating new management practices around this knowledge. A few tried-and-tested best practices were detailed last month during KubeCon+CloudNativeCon.

Tomorrow’s 5G ‘Killer Apps’ Will Need a Strong Foundation in CI/CD

Communication service providers (CSPs) have all grappled with one central question through every generation of wireless networking: what’s the “killer app.” And the 5G networks being rolled out now open up so many new opportunities in untapped enterprise markets. But whatever approach a CSP takes, it will need to adhere to Cloud Native development and deployment principles to stay competitive in what will no doubt be a fiercely and rapidly evolving market, argues Nokia Software’s Shweta Kapur in this contributed post.

A Deep Dive into Kubernetes Scheduling

The Kubernetes Scheduler is one of the core components of the Kubernetes control plane. It’s possible that you’ve never checked Kubernetes Scheduler’s logs or configuration parameters because, for the most part, the tool works well for the majority of development, testing, and production cases. In this excellent contributed post, Granulate’s Ron Sobol takes a deep dive into this technology, starting with an overview of scheduling in general and also discussing the scheduler’s bottlenecks and the issues that you may run into in production.

Puppet CTO Deepak Giridharagopal discussed automation and Puppet's support for hybrid multicloud and on-premises environments during his Puppetize Digital 2020 keynote.

Puppet CEO Yvonne Wassenaar, during her Puppetize Digital 2020 keynote, said integrating "security into the DevOps chain" remains paramount.

Simone Van Cleve, senior marketing programs manager for Puppet, gave a demo of Puppet Comply during Puppetize Digital 2020.

During Puppetize Digital 2020, Kenaz Kwa, director of product for Puppet, described how Relay’s event-driven automation can now be integrated with Puppet and Puppet Enterprise.

At AWS re:Invent, JPMorgan Chase CTO Lori Beer: "Leveraging AWS, and modern engineering practices..." we are relying on "more advanced AI and analytics than we ever have before."

The New Stack Makers podcast is available on:

SoundCloud — Fireside.fm — Pocket Casts — Stitcher — Apple Podcasts — Overcast — Spotify — TuneIn

Technologists building and managing new stack architectures join us for short conversations at conferences out on the tech conference circuit. These are the people defining how applications are developed and managed at scale.

Pre-register to get the new second edition of the Kubernetes ebook!

A lot has changed since we published the original Kubernetes Ecosystem ebook in 2017. Kubernetes has become the de facto standard platform for container orchestration and market adoption is strong. We now see Kubernetes as the operating system for the cloud — evolving into a universal control plane for compute, networking and storage that spans public, private and hybrid clouds. In this ebook you’ll learn:

Kubernetes architecture.
Options for running Kubernetes across a host of environments.
Key open source projects in the Kubernetes ecosystem.
Adoption patterns of cloud native infrastructure and tools.

Download Ebook

We are grateful for the support of our ebook sponsors: