Very interesting read. I wonder if the llm is getting tripped up by the system prompt statement to not refactor code. Modifying failing code could be considered refactoring to it since it lacks understanding of the code (it only mimics understanding…) I wonder if an extended thinking model may thrive because of this.
I wonder how the system performance might change by modifying the restrictive “begging” to avoid refactoring, and if it would allow the usage of a simpler/cheaper model and also would it have helped the failed implementations .
I've started following this augmented process in work and it's already paying dividends... Our principal lead is using the term genie already! (Fully attributed of course)
At the moment I'm iterating over the genie guide to find the sweet spot for our internal guidelines but watching the genies action TDD is extremely satisfying.
b. avoid being slightly steered into the wall whenever you let your guard down?
---
My alternative — Charted Coding, but is it an alternative? or is it just Augmented Coding? 🤔 — starts with a design doc, and with the help of an MCP Server, the workflow goes like:
1. Human writes the design doc
2. Genie helps with minor tasks like drawing diagrams if necessary
3. Genie reviews the design doc — this improves the design and also raises alerts in terms of what the genie (mis)understood.
4. Genie(s) goes TDD
5. Human reviews and asks genie(s) for tidyings
(moving from one step to the next is human-driven on purpose)
Note: The design doc could include a plan of what should be tidied first to prepare the runway for the next behavior. This could even indicate tasks that can be parallelized on different genies etc...
---
c. 😅 Rereading myself, this sounds like a shameless plug, but I always wondered what your take is on design docs? (Obviously, they come with a considerable risk of falling into mini-waterfall)
Design docs have the problem that I address in the Tidying books--when do you make design decisions. If you're not going to learn anything & nothing external is going to change, then go ahead and make those decisions today. The greater the pace of change, the greater the value of deferring decisions.
That said, we are all on unstable new ground with augmented coding, so try all the things.
Actually, vibe coding comes in handy for spikes and learning input.
Spike with vibe coding
=> learn and throw away
=> write design doc to make sure everybody’s on the same page, including the genies
=> TDD
=> Tidy when necessary
=> if you hit the wall, no problem. Edit design doc and let genies adapt or throw away everything except the learnings in design doc and ask genies to try again
In the video, it actually happened to me. When the genie integrated the paginator, there were too many changes for my little brain to follow.
I needed a tidying first (adapting the data fetching services before using them)
I could have thought of the tidying timing and I didn’t, but it’s ok because the cost of change is so cheap and ego-less, that reverting and taking another path is even easier than before.
I am curious about the dynamics between humans and genies in such workflows. Will this isolate devs even more? Will devs discuss design more?
I've had pretty limited success in doing anything complex this way, but it's been helpful for getting "nice to have" / low time-value stuff done that otherwise would have lingered indefinitely in the todo list.
Babysitting it to stop it from going down rabbit trails or cheating is absolutely necessary. Left to its own devices it'll often do decent work, then destroy it, and even destroy previously working, committed code.
Between the babysitting and guidance it only gets these small tasks done about 2-3x as fast as doing it manually, but that's still sometimes fast enough to justify a feature that otherwise wouldn't make the cut.
Having it prototype in a simpler language and then translating to the production language is an interesting idea. The next time I feel like throwing 5 bucks at an "agent" I'll have to give that a try.
Very interesting read. I wonder if the llm is getting tripped up by the system prompt statement to not refactor code. Modifying failing code could be considered refactoring to it since it lacks understanding of the code (it only mimics understanding…) I wonder if an extended thinking model may thrive because of this.
I wonder how the system performance might change by modifying the restrictive “begging” to avoid refactoring, and if it would allow the usage of a simpler/cheaper model and also would it have helped the failed implementations .
I've started following this augmented process in work and it's already paying dividends... Our principal lead is using the term genie already! (Fully attributed of course)
At the moment I'm iterating over the genie guide to find the sweet spot for our internal guidelines but watching the genies action TDD is extremely satisfying.
Very interesting!
About "intruding more on the design", how do you:
a. avoid the review fatigue?
b. avoid being slightly steered into the wall whenever you let your guard down?
---
My alternative — Charted Coding, but is it an alternative? or is it just Augmented Coding? 🤔 — starts with a design doc, and with the help of an MCP Server, the workflow goes like:
1. Human writes the design doc
2. Genie helps with minor tasks like drawing diagrams if necessary
3. Genie reviews the design doc — this improves the design and also raises alerts in terms of what the genie (mis)understood.
4. Genie(s) goes TDD
5. Human reviews and asks genie(s) for tidyings
(moving from one step to the next is human-driven on purpose)
Note: The design doc could include a plan of what should be tidied first to prepare the runway for the next behavior. This could even indicate tasks that can be parallelized on different genies etc...
---
c. 😅 Rereading myself, this sounds like a shameless plug, but I always wondered what your take is on design docs? (Obviously, they come with a considerable risk of falling into mini-waterfall)
90s Distilled video: https://youtu.be/oWYnuz2dI7I
Full video: https://youtu.be/8z9tUsSoros
Simple diagram: https://bsky.app/profile/younesjd.dev/post/3lqwtil3bj22u
Design docs have the problem that I address in the Tidying books--when do you make design decisions. If you're not going to learn anything & nothing external is going to change, then go ahead and make those decisions today. The greater the pace of change, the greater the value of deferring decisions.
That said, we are all on unstable new ground with augmented coding, so try all the things.
Actually, vibe coding comes in handy for spikes and learning input.
Spike with vibe coding
=> learn and throw away
=> write design doc to make sure everybody’s on the same page, including the genies
=> TDD
=> Tidy when necessary
=> if you hit the wall, no problem. Edit design doc and let genies adapt or throw away everything except the learnings in design doc and ask genies to try again
In the video, it actually happened to me. When the genie integrated the paginator, there were too many changes for my little brain to follow.
I needed a tidying first (adapting the data fetching services before using them)
I could have thought of the tidying timing and I didn’t, but it’s ok because the cost of change is so cheap and ego-less, that reverting and taking another path is even easier than before.
I am curious about the dynamics between humans and genies in such workflows. Will this isolate devs even more? Will devs discuss design more?
Very cool! Thank you for sharing your system prompt as well.
I've had pretty limited success in doing anything complex this way, but it's been helpful for getting "nice to have" / low time-value stuff done that otherwise would have lingered indefinitely in the todo list.
Babysitting it to stop it from going down rabbit trails or cheating is absolutely necessary. Left to its own devices it'll often do decent work, then destroy it, and even destroy previously working, committed code.
Between the babysitting and guidance it only gets these small tasks done about 2-3x as fast as doing it manually, but that's still sometimes fast enough to justify a feature that otherwise wouldn't make the cut.
Having it prototype in a simpler language and then translating to the production language is an interesting idea. The next time I feel like throwing 5 bucks at an "agent" I'll have to give that a try.