12 Comments
User's avatar
Sebastian Sigl's avatar

Thanks for sharing!

Claude Code is a game-changer with clear instructions, especially using TDD/TCR. I believe once agentic coding advances, the IDE won't matter as much.

And yes, 100% agree that the best approach today won't be the best next month.

Things are changing fast!

Expand full comment
Robert Matsuoka's avatar

💯

Expand full comment
Uncanny Valley's avatar

Can you elaborate on the TDD setup you are finding useful please.

Expand full comment
Sebastian Sigl's avatar

Agents benefit from fast feedback cycles. Instruct your agent to always write tests first, provide feedback, and then let your LLM handle the implementation.

This way, it can verify its work and ensure future changes don’t introduce regressions.

Outside-in TDD is particularly effective, as behavior-focused tests support refactoring towards your intended design.

Expand full comment
Matteo's avatar

"Uses Claude so the decisions are only so good" <- I thought Claude was pretty good; which other models are better?

Expand full comment
Josh Wand's avatar

To me there are (now) only 3 classes of agents:

- IDE: Cursor, Windsurf, Copilot, Augment, Roo/Cline, etc

- Interactive Terminal: Aider, Claude Code, OpenAI Codex (local version)

- Cloud "--accept-all"* PR Agent: Devin, Codex, Jules, Cursor background agents.

The only real difference inside a class is pricing and rules/system prompts--otherwise they are all more or less the same set of capabilities, and same set of weaknesses inherent in the models.

*: the "autonomous" agents are just of a set of rules that tell the same agent to always hit "accept all" until there's nothing more to accept.

(post on this to come)

Expand full comment
Erika Rice Scherpelz's avatar

If you like Claude Code, Amp Code is like Claude Code but in VS Code (almost literally): http://ampcode.com/

(Disclaimer, I work at Sourcegraph)

Expand full comment
Alex Jukes's avatar

Thanks for this list, very helpful!

I’m currently leaning into Windsurf, which feels like the most novel in terms of UX, in that it’s essentially prompting models via a chat interface but in your IDE, so you can jump into the code easily at any time. Also nice that you can flip between models being used at will.

I see it kind of as a spectrum at the moment between ‘autocomplete on steroids’ (Copilot) on one side and cascade coding (Windsurf) on the other, with Cursor somewhere in the middle. Will be fascinating to see how the spectrum evolves, expands, and very likely moves into completely different dimensions.

Thanks for your posts on augmented coding, it’s given me the impetus to just start coding with AI which has helped me massively get over my anxieties around the technology I’ve been experiencing recently. As Matthew Mchonnehy says, I’m less impressed, more involved, which is a healthy way to be.

Expand full comment
Ben Christel's avatar

Augment Code just changed their pricing, so if you are creating a new account, the $30/mo plan is no longer an option. https://www.augmentcode.com/blog/new-simpler-pricing-with-user-messages

> We chose User Messages as our usage measure to make things simple for our customers. Every interaction consumes a variety of tool calls, context windows, and token counts—details we believe should be our responsibility to optimize, not yours.

Methinks the incentives are not aligned here. This pricing plan is likely to push Augment in the direction of cutting token counts at the expense of quality, requiring users to send more messages (and thus pay more) to accomplish what they want.

Expand full comment
Sean Corfield's avatar

I've only used GitHub Copilot, in VS Code, and we have a subscription so it's a flat $19/month, with access to a lot of models: Claude 3.5 Sonnet, 3.7 Sonnet, 3.7 Sonnet Thinking, Gemini 2.0 Flash, 2.5 Pro (preview), GPT 4.1, 4o, o1 (preview), o3-mini, o4-mini (preview). I've only tried a couple so far, and generally haven't gotten far enough into vibe coding to draw many conclusions about them.

I work with Clojure and, historically, the models haven't been great about balancing parens so it can be frustrating. A new Clojure extension for VS Code just appeared that provides MCP over the REPL, so the agent can evaluate code and run tests in the REPL automatically. That was fun on a small, standalone project but I haven't tried it on larger projects yet.

Expand full comment
Felix Neumann's avatar

So you're not happy with Claude. Which model did you make best experience with? In which situations did they excel Claude?

Expand full comment
Rob's avatar

I like your article. I used Gemini, Copilot, and ChatGPT to get the framework of the project I am working on. Once I have exhausted all three I transfer the project to replit for the final fine tuning. Question - what are your thoughts on replit?

Expand full comment