Breaking Software, Building Software

Last week, my colleagues Michael Specter, Daniel Weitzner, and I released our work finding vulnerabilities in software being used in the US 2020 primaries, as reported by the New York Times and quite a few other media outlets.

It's been years since my last reverse-engineering project of this magnitude, and so I did a lot of reflection during this project: what lessons about software design can we draw from reverse-engineering? And I found that there are a lot of lessons that can be drawn from precisely how they're different.

A common mistake that I see even experienced engineers making when reading new codebases is to start at the boundary. For instance, I've repeatedly watched many engineers start reading the git source code by looking for the source of the "git add" command, unaware of the layers of parsing and indirection that divide it from the core of the software.

However, when reverse-engineering binaries, this is the exact right thing to do. In reverse-engineering, turns an unknown component into a known one by seeing how it interacts with known information. The starting point is typically strings, UI, inputs — in other words, the boundary. This leads to the idea of reading code as propagating information from knowns to unknowns. In a well designed codebase, there are central aspects that affect everything; understand them, and you're fluent. In a poorly-designed one, you might as well be reading a decompiled binary.

But, for the most part, reading code to break it was different beast from reading code to make it. I was much more free to ignore most of it and just look at a few critical paths. It was a totally different kind of thinking from ordinary programming. And that leads me to...

Research Corner: Incorrectness Logic

Here's a fact: Most security flaws cannot actually be exploited.

If you're not a security person, this might be confusing. What's a security flaw if not something that can let an attacker take control of your system or steal data?

Look at the Secure Coding Standard, and you'll see that the things a security expert will flag are a lot of little pieces that can be spotted at a glance. Do all of them correctly, they claim, and your software will very likely be secure. Writing correct software is about building modular pieces that can be reasoned about individually.

Yet constructing exploits, or reproducing bugs, is about finding long chains of events across the entire program that result in it entering a bad state. And saying that one can't guarantee a buffer doesn't overflow is far from being able to show that a malformed value in a packet will actually be used in a way that lets an attacker read data (i.e.: Heartbleed).

In his groundbreaking paper, Peter O'Hearn last month presented incorrectness logic, a new way of reasoning about programs. As I teach in my web course, when you're writing a program, your thought process mimics some form of classical Hoare logic, where you track a small amount of information at each line, and deduce that each line takes the program from one desirable state to another. The goal is to determine that, no matter what, the program will have correct behavior, while also minimizing the amount of information needed to show this. Incorrectness logic is a new way of reasoning about programs that mimics the mind of an exploit-developer. The attacker has to remember information about a path through the entire program, but is free to ignore all but the path of interest.

Bug-finding tools have always suffered from false-positives, and so the main application of this work will be as a theoretical foundation for building tools that only find real bugs (but are not guaranteed to find all of them). But, for the non-tool-builders, it gives a rigorous footing to the idea that building and attacking a program need different and incompatible kinds of reasoning.

Guest Blog Post on Defunctionalization

SIGPLAN, the ACM Special Interest Group on Programming Languages, has a blog now. I wrote a guest post for it, which quickly became its the second most popular post. The content is similar to "The Best Refactoring You've Never Heard Of," but less detailed, and features a few more applications of defunctionalization.

Defunctionalization: Everybody Does It, Nobody Talks About It

Advanced Software Design Web Course: Now Open

The next run of my Advanced Software Design Web Course, starting 3/4, is now accepting applications. Students continue to report massive benefits to their software engineering abilities, often starting from the first week, helping them both in jobs and in interviews.

I'm pleased that, with the hiring of my new TA Gabriel Giordano, the number of slots has been increased to 20. Even still, I already have people lined up for 2/3 of the slots, so get in fast.