Continuous Chaos but in a Good Way

What’s the most challenging stage of the machine learning (ML) life cycle? Data gathering and cleaning has traditionally been the most time-consuming aspect of data scientists’ and analytics practitioners’ jobs. But as solutions have popped up to address this issue, the bottleneck has moved to creating and deploying ML models.

The jury is still out on how many stages there are in the standard ML life cycle, but for sure getting started is not a problem. Executives are throwing money at projects. Although a lot of projects have failed at the proof-of-concept phase, others have found success in identifying real business goals and establishing data science teams.

Actually building and evaluating machine learning models is the core stage of the ML life cycle. But according to Algorithmia’s “2021 Enterprise Trends in Machine Learning,” once a use case is actually defined, it takes 66% of organizations more than a month to develop an ML model. For 64% of organizations, it takes at least another month to deploy that model. Per the report, most data scientists spend at least 25% of their time deploying models. The machine learning engineers described in these charts are often deploying into test environments as well as into production. When assessing additional studies it is important to realize that respondents may not understand the distinction between these two types of deployments.

Once a model is served into production, it is monitored in regards to DevOps and IT-related performance metrics, but also to make sure its accuracy doesn’t degrade over time. Retraining models, audits, tracking proper security and governance, and live A/B tests are all iterative steps that can feed back into earlier stages of the life cycle.

At the end of the day, data scientists analyze and understand data to influence decisions. In recent years, they have been less likely to “waste” their time acquiring or cleaning data, but have become more proficient at writing their own software to automate deployments and workflow. If ML platforms and popular projects can abstract away some of this work, then data scientists and machine learning engineers can spend more time creating value for their organizations.

Continuous Chaos but in a Good Way

We always enjoy a good talk from Amazon Web Services’ resident cloud guru Adrian Cockcroft, and his keynote at MayaData’s Chaos Carnival this week did not disappoint.

He offered a good way of thinking about system resilience. When a system goes down, we tend to want to look at a root culprit of some sort, a failed router, a faulty configuration script. But that is the wrong way of looking at the problem, he advised.

Think of a rope that breaks. "The last strand breaking isn't the cause of the failure. The real cause is that the rope got too frayed,” he said. Instead, he advised, when building out systems, we should instead understand how much margin of failure we have, how much extra capacity a system has. We should understand how frayed we can let our rope get.

To this end, Cockcroft recommended Sidney Dekker’s book “Drift into Failure.” Dekker asserted that, “even if everyone does everything correctly, at every step along the way, you can still get a catastrophic failure, because people are optimizing locally, rather than optimizing for the big picture outcome,” he said. “If you never have a failure, you start believing it can’t happen.”

Before Cockcroft worked at AWS, he was an architect at Netflix, where the concept of chaos testing was pioneered. In chaos testing, resources are randomly taken offline to test the system’s resilience. It seemed like a radical idea a few years back, but over time companies like Gremlin productized the tools for knocking resources offline and then recording the response. And increasingly chaos testing seems like standard practice for site reliability engineers (SREs). Now Cockcroft wants to take your systems to the next level of resilience, with continuous chaos testing.

“What this is going to do is it's going to harden the patterns,” he said. “We're taking what has been traditionally a pretty scary annual experience, that's a big pain in the neck to do, to something that's automated, continuous.”

Why Open Source Project Maintainers are Reluctant to Use Digital Signatures, Two-Factor Authentication

Open source can still be abused by unscrupulous developers. So, why don’t we make sure when a programmer attempts to merge code into a program that they’re really who they say they are, by using two-factor authentication (2FA) or a digital signature? Good question. TNS open source security reporter Steven J. Vaughan-Nichols delves into the reasons behind the seemingly industry-wide reluctance to embrace best practices for supply chain security here.

Alphabet Workers Union Tests Tech Industry Appetite for Unionization

On Jan. 4, the Alphabet Workers Union (AWU) launched with 230 members, instantly making it the biggest tech union. A few weeks later, the membership of this American and Canadian union, which includes permanent, contracted and vendor company employees of Google, its subsidiaries, and other Alphabet Inc. brands, has quadrupled. But Google employees aren’t the first to want to change their tech companies and we are pretty sure they won’t be the last. In this post, TNS London culture journalist Jennifer Riggins reflects on the growing mobilization of people in tech, and offers some predictions of where it’s headed.

Microsoft Excel Becomes a Proper Programming Language

Microsoft’s researchers believe they’ve now finally transformed Excel into a full-fledged programming language, thanks to the introduction of a new feature called LAMBDA, which makes the spreadsheet Turing Complete. This means that, in theory anyway, you can compose any computation in the Excel formula language, TNS lifestyle correspondent David Cassel reports.

ISSUE 253: Continuous Chaos but in a Good Way

“Docker is an instructive example of why simply making the code available publicly is insufficient. The project had one foot in and one foot out of the community-driven model. The code was open, but the design was not.”