Genie Tarpit
Genies give you code that’s a degraded facsimile of the mediocre code it trained on. How can we get the genie to give us valuable code?
“Valuable” lives on 2 axes:
Features—what the code does now.
Futures—what we can get the code to do once we learn the lessons of this set of features.
As an engineer I am constantly juggling these two dimensions so I wanted a way to visualize them. On the features axis software either works or it doesn’t (more or less) & the region of working is rather small.
(Note that this puts features on the opposite axis from where it appears on Features Versus Futures. I tried it both ways & I like the layout best with the axes this way.)
The other axis is flexibility—can we make changes that:
Work as expected
Don’t break anything that was already there.
I’ve called this “optionality” or “futures” recently. Still looking for the right words.
Flexibility/optionality/futures has a wider operating range than whether the software works or not. You can skimp of flexibility for a while & not really feel it.
Orientation
Where was development in ancient times, you know, six months ago? Well, you had some teams who had high standards for the behavior of the system—tests, frequent integration, observability, zero defect tolerance, retrospectives. They also had high standards on flexibility—readable code, mutual comprehension (pairing, mobbing, thorough reviews).
Treating the two axes as orthogonal is simplistic. If you have fewer interruptions from defects you have (or can choose to have) more time to invest in flexibility. Also, if you have flexibility you are more likely to create features that work.
Muddling
Most teams weren’t in the upper right. Instead, they muddled along with mostly-working software that was quite difficult to change. (We can talk later about why folks would choose to stay in this region when they could be further up & to the right.)
Genie
Here’s what I’ve observed—genies naturally live down & to the left of muddling. The “plausible deniability” task orientation of the genie leaves it claiming success even though the code doesn’t work at all. And complexity piles on complexity until even the genie can’t pretend to make progress any more.
Solution?
You probably saw this one coming—nobody knows. Does the model need to be trained on better code? Trained on good commits? Better harnesses? Tests? Which tests? When? Better prompting? Or grasp the nettle of the Bitter Lesson & let the model develop it’s own style of development, even if it turns out to be incomprehensible to us rapidly-obsolescing humans?
Awareness is the first step. Where are you? Where do you want to be?
This quarter’s newsletter is brought to you in partnership with WorkOS.
WorkOS is the infrastructure B2B and AI-native companies use to sell to enterprise. It covers everything enterprise security requires: SSO, SCIM, RBAC, Audit Logs, AI governance, and more. Engineering teams ship it in days. Trusted by 2,000+ fast-growing companies, including OpenAI, Anthropic, Cursor, and Vercel.







This is so relevant! In our team right now we have a very clearly defined code base. We have made specific design choices and when we stick to them the code is clean, extendible, and enjoyable to work on. It's great!... but we have to write all the code (sadface). Jump in AI and we have specific AI instructions written with all our practices in mind, there's examples of gold standard code for a multitude of scenarios. We also have skills and workflows that help us go from nothing, to a technical plan, to a delivery plan with options and progressions that help us keep in touch and in line.
Here are the issues. If we let the AI go alone it creates a whole load of crap. It's a mess. It cuts corners and forgets things and to be honest, that's expected as it's a big code base. But if we pair with the AI 1 task at a time it's brilliant. We almost always have to tweak the code with the first few tasks but then the rest becomes fluent. We almost always have to remind it that we need tests and ask it to look again at it's skills about TDD and the types of tests we like and don't like.
The lessons right now:
1. You can only pair with AI like you pair with anyone else. Enjoy it! but you now mostly read code than physically write code (which I'm quite happy about)
2. All those people vibe coding and letting agents do all the work are screwed.
This will all likely change by July...
I keep feeling that "A Deepness in the Sky" (A Deepness in the Sky - Wikipedia https://share.google/PTkGmUL6jeJWwXhw4) is the best reference point for this moment. Spoiler: the bad guys first succeed, then fail do to quiokly writing spagetti code.