For quite some time I’ve been bombarded with statements like “this new neural network is great for software engineering productivity”. And sure thing, it picked my curiosity. New shiny toy, what could be more interesting to try?
This post isn’t about Cursor or Claude Code. But it is about numbers and comparisons. It’s a reminder – we need to know what we’re optimising for.
It started with yet another sales pitch – “Our AI tools will make you 25% more productive”. I’ve been working on engineering efficiency long enough to understand how hard it is to measure productivity. There isn’t a single metric that can be considered a good enough approximation.
Yet there is a good one when it comes to the maintenance cost. It is as simple as the codebase size. Proven many times, subject of multiple scientific researches. One of the best examples is provided by Ericsson. Their AXD301 switch codebase became 4 to 10 times smaller compared to the same system written in C++, Java and PLEX. Fault rate per thousand lines of code remained constant.
Motorola Labs presented a similar outcome in their excellent presentation:

And that’s just the surface of it! The real support cost difference is going to be much larger if we agree to maintain the codebase for many years.
Coincidentally, the world’s most famous equation happens to be applicable to software engineering.
E = mc²
Errors = more * (code)²
So far, this metric has been the best one to define engineering productivity. The smaller the codebase is, the more productive engineers are. On all fronts. It’s easier to read, navigate and debug when there is less code. Codebase size isn’t an asset, it’s a liability. The fundamental maintenance cost grows with it.
What does it mean, and how is it connected to AI-enabled software development? Pretty simple: we use neural networks to write code. Marketing pitches are all about “having 25% more Pull Requests” and “20% more code lines checked into source control”.
While the actual goal should be the opposite. For example, 20% less code (removed duplicates!), 30% less time debugging someone’s “AI-suggested” test case, 50% faster test suite (because timer:sleep
was replaced with a proper trigger).
I’d definitely give such AI a go.