One of my experimental projects is to see if I can write a library-quality, performance-competitive basic data structure with a genie. This started when I was building an experimental database project for an experimental programming language (are you sensing a theme here?) project. If you can’t get the data structures right, none of the rest of those higher layers are going to work either.
This isn’t vibe coding. I care very much about the internal details of these projects because the moment I miss something, the genie does something that hamstrings us later. Well, the genie or I, but I’ll get to that later.
Thanks so much to sponsor Augment Code (their products appear in this post). Congratulations on the GA release of Remote Agents. I was skeptical when I heard of a horde of free genies running around my code, but I’m warming to the idea. There’s subtlety, nuance, & skill to picking tasks for a remote agent but that’s the joy of working on rapidly changing technology—learning those skills & sharing them. Which is what this community is so good at.
B+ Tree
The data structure I’m building is an efficient, scalable map. The goals are:
Fast retrieval—O(log n) where n is the number of keys in the map.
Fast insertion—O(log n)
Fast deletion—O(log n)
Crucially, fast range scanning. We want to be able to iterate from key 1 to key 2 in O(m + log n) where m is the number of keys between key 1 & key 2.
The basic operations are put (sometimes called insert), get, delete, & scan. The data is stored in a tree of nodes.
Leaf nodes map from keys to values.
Internal, or branch, nodes, map from keys to nodes, either branch nodes or leaf nodes.
Nodes have a fixed capacity.
The empty tree is a header object & an empty leaf node:
After we insert data at keys 1, 2, 3, & 4, the leaf is full:
Here’s where the B+ Tree gets tricky. What happens when we insert key 5?
We need to split the leaf into two leaves & introduce a branch node with the leaves as children:
There are several tricky edge cases to get right in insertion & splitting, especially when we have a multi-level tree & leaf splitting triggers branch splitting. And insertion isn’t even the tricky bit. Deletion is a nightmare.
Supporting the fast scan operation requires that the leaf nodes are all at the same (bottom) level of the tree & that they are linked together.
Experiment
This is an experiment in augmented coding not a data structures tutorial so I’ll save the B+ Tree details for another day (it’s so cool though y’all!) I’ve been working on this project off and on for the better part of a month, including starting over twice when things got too complicated for the genie to make further progress. For educational purposes I’m implementing it in Rust, a language I hadn’t touched until I started the project.
The problem I kept running into was runaway complexity. The genie would implement the next feature, but add excess complexity in the process. The next feature took longer & had more stumbles along the way. Eventually the genie just couldn’t make further progress.
The particular problem in my project was that two sources of complexity compound:
Rust’s ownership system meant that many otherwise-sensible choices weren’t available, & keeping the ownership system happy in the face of changes was increasingly difficult.
The algorithmic complexity of the the problem, especially the deletion code.
Eventually the genie would code itself out of options & I started over. I don’t (or at least didn’t) know enough about Rust to keep it out of trouble or get it out of trouble.
Again?
I started the third iteration with diminished expectations. I gave the genie explicit guidance to work one test case at a time, to continually refactor, to separate commits into behavior or structure changes, the whole packaged. Things went better at first but then I could feel the tar start to grab my boots.
“Not again!” That’s when I really noticed the compounding effect of the 2 sources of complexity. Maybe the problem is over-constrained? Can I tackle the complexity separately?
So I asked the genie to write a Python implementation of the B+ Tree, using the same system prompt I had been using. I guided it into roughly the design I expected. In a day I had comprehensive test cases passing with tidy code. (Along with a fuzz tester to discover new corner cases. Pro tip—your genie is great at coming up with special purpose testing tools!)
That’s when I thought, “This is a perfect task for an autonomous agent!” I had early access to Augment’s Remote Agent (disclosure—they sponsor the newsletter). So, as an experiment, I prompted, “Throw away the current Rust code. Start over & copy the Python code. Translate a test & make it work. Keep going until you have translated all the tests.”
Results!
I intervened a few times in the next hour or so, but what came out was idiomatic Rust that actually passed all the tests. The complexity was much lower than what I had before. And, while it was running, I used Codex to write a C extension for the Python code to get the performance of the Python to within spitting distance of the standard Python SortedDict. Because the autonomous agent was working in a different environment, I could have both genies going at the same time without having to worry about conflicts.
I think the keys to the successful outcome of the experiment were:
Limited complexity. I haven’t tried the experiment (but now I will) of throwing the autonomous genie at the problem of writing a B+ Tree from scratch & see what it comes up with. Seems like not dealing with all that complexity at once is a key.
Clear outcome. I wasn’t having the genie come up with the tests & the code. Here are the tests. Make them work.
Clear scope. Again, the tests limit the scope. Make them work. With tidy code. Don’t make up any new requirements. Don’t volunteer any scope. This. Only this. (Can you tell I’ve gotten burned by genies over-doing scope?)
I’ve since tried autonomous agents on other tasks that seem to meet the above criteria. For example, I wanted to use the Option combinator to simplify code everywhere in the code base. It worked pretty well.
The free genie holds great promise, but it needs to be on tasks that don’t tickle the genie’s weak spots.
I would enjoy reading in more details about your process doing TDD with the genie. Most speakers about augmented / vide coding err towards waterfall-y techniques (PRDs, large specifications,...)
Based on the description it sounded like the genie did better with python. The genie had an easier time in a “freer” language it sounds like. Do you think that was the case?