The Accidental Genius of Facebook Scuba

BHPCs Are the Tip of the Iceberg. HPE is by far the leading manufacturer associated with the world’s fastest, commercially available computing systems in the most recent TOP500 list. After them, four of the next six companies are Chinese (Lenovo, Sugon, Inspur, Huawei). So what? Supercomputers can be super boring, but the market is relevant because it shows where hardware technology companies are investing their R&D budgets. Since R&D leads to long-term corporate success, it is not surprising that Sugon and Inspur have recently jumped onto IDC’s server market leader boards. But servers is a dying market, right? It is shrinking, but it's also strategically important for 1) economic growth and 2) its connection to the fast-growing demand for artificial intelligence.

Machine learning (ML) and artificial intelligence (AI) are now common use cases for high-performance computing (HPC). What was once the domain of academia, operating scientific workloads is moving beyond a niche ability. Of course, many ML and AI workloads can be handled at the edge, or at least don’t need HPC. However, the model in which these workloads can be off-loaded to a third-party cloud service is something we hear about often. Intel, Nvidia and ARM are focused on optimizing hardware for AI and ML. Along with the big cloud providers, The New Stack expects the supercomputer manufacturers to be among the leading buyers of a new wave of CPU innovation.

The Accidental Genius of Facebook’s Scuba

To hear Charity Majors tell it, the success of Facebook’s internally-developed Scuba performance-monitoring software was nearly a fluke. Majors used the software while at Facebook and found the software to be invaluable despite itself, she noted in her talk at the latest New York “Papers We Love,” gathering. Facebook built Scuba, a distributed in-memory database, for most of its real-time analysis, to watch its tens of thousands of apps. For Majors, it was obvious that the paper that described Scuba was written by infrastructure engineers, not computer scientists. The ways in which Scuba violates computer science “are so many,” she said. Yet, nonetheless, it contains “lots of awesome clues” on how to do event-driven debugging or system debugging at scale. “The stuff that Facebook that was doing in 2011 is pretty much exactly applicable to the center of the market today,” she said.

The paper and the technology feels like the engineers arrived at their design decisions at 3 am, yet, at the same time, all these decisions, however loopy they might seem, were the correct ones. There are no pre-built schemas; they are generated on the fly. You can’t determine what schemas you’d need when you’re not sure what questions you’d ask. Too much data? Scuba just deletes excess data, using sampling to catch notable events. “This is like the anti-CAP theorem. When it doubt, just toss it,” Majors said, laughing.

Most performance monitoring (perfmon) tools we use today are knocking on death’s door, Majors told the audience. They were made for an earlier, simpler era. These were tools aimed for answering specific questions. But debugging today’s stacks, you’re not even sure what the question would be in the first place. This is what Scuba does so well, and why it is so widely loved at Facebook, despite all its shortcomings. The software can ingest millions of rows (or “events”) per second, and can be queried in such a way that it returns a string of related events, rather than the individual bits of operational data other perfmon tools typically deliver. This approach “is so much more intuitive. It is the way our brains think,” Majors said. And this is why Majors and Christine Yen founded Honeycomb.io: to recreate the benefits of a Scuba-like technology for the enterprise.

To hear more about Honeycomb.io and Majors’ work, be sure to check out our Q&A with her that ran this week.

Intel Cues New Xeon Chips for an AI Future

Should software define how chips are designed? Or vice versa? Intel has struggled with that question for decades, but now is downplaying its CPUs and treating the data center like one large computer. That was the underlying theme at Intel’s launch of new Xeon Scalable processors on Tuesday at an event in Brooklyn, New York. Intel called the new chips the “biggest data center advancement in a decade,” and a general-purpose compute engine that would drive artificial intelligence, networking, storage and the cloud.

Benchmarking Serverless: IBM Scientists Devise a Test Suite to Quantify Performance

Serverless technologies promise to simplify scalability. But while delegating the job of running functions to a cloud provider and letting it decide how to manage execution sounds like a good idea, the developer needs to know what to expect in terms of baseline performance. Now, a pair of IBM researchers is developing a test suite to better understand and compare the performance characteristics of serverless platforms.

Dissecting the Stack for Place, Reddit’s Collaborative Pixel Art Project

This year, for April’s Fools day, Reddit created a canvas for a collaborative pixel art project that thousands of its users could participate in, in whatever way they collectively or individually saw fit. The idea was to offer a user a big page on which they can only add a single pixel of color every five minutes or so. Sounds simple, but in practice the effort could only have succeeded on top of the heavyweight infrastructure of Reddit. We take a look behind the project to find what technologies Reddit used to make this gigantic collective art project happen.

ISSUE 73: The Accidental Genius of Facebook’s Scuba

“Most dashboards are artifacts of past failures — if you think that serves your current and future situations, you are blinding yourself.”

The Accidental Genius of Facebook’s Scuba

Intel Cues New Xeon Chips for an AI Future

Benchmarking Serverless: IBM Scientists Devise a Test Suite to Quantify Performance

Dissecting the Stack for Place, Reddit’s Collaborative Pixel Art Project