Why Programmers Should(n't) Learn Theory: Part 3

Over the past two newsletters, I've discussed the utility of 3/5 theoretical fields of software. Today, we'll examine the last two: the abstract nonsense and the rabbit hole.

Formal Verification

Formal verification means using a tool to rigorously prove some property (usually correctness) of a program, protocol, or other system. It can broadly be split into two categories: mechanized and automated. In mechanized verification/theorem-proving, engineers attempt to prove high-level statements about a program’s properties by typing commands into an interactive “proof assistant” such as Coq, Isabelle, or Lean. In automated verification, they instead pose a query to the tool, which returns an answer after much computation. Both require extensive expertise to model a program and its properties.

Mechanized verification can provide deep insights about everyday software engineering. For instance, one criteria for choosing test inputs in unit testing is “one input per distinct case,” and doing mechanized verification teaches one what exactly is a “case.” Its downside: in my personal experience, mechanized verification outclasses even addictive video games in its ability to make hours disappear. As much as programming can suck people into a state of flow and consume evenings, doing proofs in Coq takes this to another level. There’s something incredibly addicting about having a computer tell you every few seconds that your next tiny step of a proof is valid. Coq is unfortunately also full of gotchas that are hard to learn about without expert guidance. I remember a classmate once made a subtle mistake early on in a proof, and then spent 10 hours working on this dead end.

I do frequently draw on concepts from this kind of verification. Most notably, I teach students Hoare logic, the simplest technique for proving facts about imperative programs. But I do it mostly from the perspective of showing that it’s possible to rigorously think about the flow of assumptions and guarantees in a program. I tell students to handwave after the first week in lieu of finding an encoding of “The system has been initialized” in formal logic, and even leave off the topic of loop invariants, which are harder than the rest of the logic. Alas, this means students lose the experience of watching a machine show them exactly how rule-based and mechanical this kind of reasoning is.

Automated tools take many forms. Three categories are solvers such as Z3 , model-checkers such as TLA+ and CBMC, and languages/tools that automatically verify program contracts such as Dafny.

I’ve argued before that software design is largely about hidden concepts underlying a program that have a many-many relationship with the actual code, and therefore are not directly derivable from the code. It follows that the push-button tools are limited in the depth of properties they can discuss, and therefore also in their relevance to actual design. Of these, the least push-button is TLA+, which I’ve already written about, concluding there were only a couple things with generalizable software design insights, though just learning to write abstracted models can be useful as a thinking exercise.

In short, learning mechanized verification can be quite deep and insightful, but is also a rabbit hole. Automated verification tools are easier to use, but are less deep in the questions they can ask, and less insightful unless one is actually trying to verify a program.

I’ve thought about trying to create a “pedagogical theorem prover” in which programmers can try to prove statements like “This function is never called unless the system has been initialized” without having to give a complicated logical formula describing how “being initialized” corresponds to an exact setting of bits. Until then, I’m still on the lookout for instruction materials that will provide a nice on-ramp to these insights without spending weeks learning about Coq’s difference between propositions and types.

Category Theory

Category theory is a branch of mathematics which attempts to find commonalities between many other branches of mathematics by turning concepts from disparate fields into common objects called categories, which are essentially graphs with a composition rule. In the last decade, category theory in programming has escaped the ivory tower and become a buzzword, to the point where a talk titled “Categories for the Working Hacker” can receives tens of thousands of views. And at MIT last year, David Spivak taught a 1-month category theory course and had over 100 people show up, primarily undergrads but also several local software engineers.

I studied a lot of category theory 8 years ago, but have only found it somewhat useful, even though my research in generic programming touches a lot of topics heavily steeped in category theory. The chief place it comes up when teaching software engineering is a unit on turning programs into equivalent alternatives, such as why a function with a boolean parameter is equivalent to two functions, and how the laws justifying this look identical to rules taught in high school algebra. The reason for this comes from category theory (functions are “exponentials,” booleans are a “sum”), but it doesn’t take category theory to understand.

Lately, I’ve soured on category theory as a useful topic of study, and I became fully disillusioned last year after studying game semantics. Game semantics are a way of defining the meaning of logical statements. Imagine trying to prove a universally-quantified statement like “Scruffles is the cutest dog in the world.” Traditional model-based semantics would say this statement is true if, for all dogs in the world, that dog is either Scruffles or is less cute than Scruffles. Game semantics casts this as a game between you (the “Verifier”) and another party (the “Falsifier”). It goes like this: They present you another dog. If you can show the dog is less cute than Scruffles (or rather, they are unable to find a dog for which you cannot do so), then the statement is true and you win. Otherwise, the statement is false.

(I admit that having this kind of conversation has been my main application of game semantics thus far.)

I started reading the original game semantics paper. The first few sections explained pretty much what I just told you in semi-rigorous terms, and were enlightening. The next section gave category-theoretic definitions of the key concepts. I found this section extremely hard to follow; it would have been easier had they laid it out directly rather than shoehorning it into a category. And while a chief benefit of category theory is the use of universal constructions that allow insights to be transported across disciplines, the definitions here were far too specialized for that to be plausible.

There’s a style of programming called point-free programming which involves coding without variables. So, instead of writing an absolute value function as if x > 0 then x else -x, you write it as sqrt . square, where the . operator is function composition. Category theory is like doing everything in the point-free style. It can sometimes lead to beautifully short definitions that enable a lot of insight, but it can also serve to obscure needlessly.

I’m still open to the idea that there may be a lot of potential in learning category theory. David Spivak told me “A lot of what people do in databases is really Kan extensions” and said his collaborators were able to create an extremely powerful database engine in a measly 5000 lines of Java. Someday, I’ll read his book and find out. Until then, I don’t recommend programmers study category theory unless they like learning math for its own sake.

Conclusion

So, here’s the overall tally of fields and their usefulness in terms of lessons for software design:

Type theory: Very useful
Program Analysis: Not useful, other than the definition of “abstraction.”
Program Synthesis: Old-school derivational synthesis is useful; modern approaches less so.
Formal Verification: Mechanized verification is very useful; automated not so much.
Category theory: Not useful, except a small subset which requires no category theory to explain.

So, we have a win for type theory, a win for the part of verification that intersects type theory (by dealing with types so fancy that they become theorems), and a wash for everything else. So, go type theory, I guess.

In conclusion, you can improve your software engineering skills a lot by studying theoretical topics. But most of the benefit comes from the disciplines that study how programs are constructed, not those that focus on how to build tools.