This post puts forward the intriguing argument that TDD is a hill climbing algorithm & therefore subject to getting stuck in a local maximum. That is, according to this argument, there are valid states of code that can’t be reached one test case at a time. (The post uses the clickbait title “TDD Cannot Work”, which is over-reach on the author’s part.)
The argument as put forward is full of logical holes, but it points to an interesting question—can we characterize kinds of problems best approached some other way than TDD? If we could identify such problems up front, we could save time & effort.
Statement & Question
I have a statement & a question. Start with the statement. When we say, “TDD is a hill climbing algorithm,” we are making an analogy. As with all analogies, insight flows from the play of “is” and “is not”. TDD kind of is a hill climbing algorithm. TDD is also kind of not a hill climbing algorithm. The ways it’s similar & the ways it’s different may teach us something about TDD (and maybe a little something about hill climbing).
First, the argument is flawed as presented. Just as well say, “Coding the whole algorithm at once is hill climbing. We write one line first & then all the rest of the lines. There are valid program states we cannot reach this way.” This is clearly specious.
The analogy between TDD as a way to move between program states & a text editor as a way to move between program states is inexact. The text editor [editor: why are we still manipulating code as a 1 dimensional array of characters wrapped into a 2 dimensional array of characters?] doesn’t constrain what we do next. We could type the program in back-to-front if we wanted to.
TDD, on the other hand, imposes constraints on motion between program states. If we have a red test we can only (in canonical TDD) change the code to make the test pass. If all the tests pass, we can either:
Write a test that will fail or,
Refactor.
This precludes us writing the whole program back to front, unless we write one gigantic test first, then type the code in our weird way, & then the single giant test passes. That would technically be TDD, but in a really weird form.
For the original analogy to hold, we would need to get the program half-way written & then not be able to move forward, either:
We can’t write the next failing test given the code & tests we already have or,
We write a failing test but we can’t get it to pass without breaking some other test.
Using the hill climbing analogy, we have 4 options in such a stuck spot:
Just have 2 broken tests for a while (or more than 2). This is uncomfortable but hardly fatal. It’s still TDD-ish. We’ve gone further down hill than usual but that doesn’t mean we can’t climb back.
Move the code & tests to an earlier state & tackle a different sequence for the tests and/or refactorings. Perhaps a different path would enable stepwise progress.
One alternative path would be to try smaller tests.
Another alternative path would be a larger next test, one that might be red for a while.
Each of these strategies involves going downhill for a bit before resuming the climb. That’s okay. Nobody said anything about monotonically increasing.
Fitness
The OP claims that TDD can’t be used to write the optimal program for many problems. I’m genuinely interested in characterizing the kinds of problems or programs that are difficult to solve or write one test case at a time. Ignoring that for the moment, though, the OP never addresses what is meant by “optimal”.
The goal of TDD (or any programming) isn’t perfection, provably optimal development. Development needs to be good enough. It’s also nice if it’s an improvement on yesterday’s development. Ignoring the “optimality” red herring, what are we trying to achieve?
Using the hill climbing analogy, the fitness function for code is complicated & multi-dimensional. Those dimensions change as a system matures (see also 3X: Explore/Expand/Extract). Here are some:
The set of inputs for which the program works as desired.
The defect rate.
The cost to develop initially.
The cost to extend.
The cost of delay.
The opportunity cost.
The audience that can read & maintain the code.
The cost of execution.
The cost of errors.
Scalability.
Coupling with other systems.
Which of these criteria, exactly, is unachievable with TDD & why? That’s the question that fascinates me.
Sidebar: Analogies
Thinking effectively with analogies is an art. The OP appears to have gained a surface understanding of hill climbing, seen how to apply it to TDD, then written the referenced post.
Here’s the thing with analogies. You’d better be prepared to dig deeper than your readers. I do my homework before I present an analogy. In a case like this, I would want to make sure I understood much more about hill climbing (and TDD too) before I published.
Shallow analogies invite easy debunking. In this case the OP ignored the fitness function & made the mistake of assuming “optimal” was the goal.
So why am I writing such an extensive response? Because I think the OP is onto something, even if it’s not what they think they are onto. I want to encourage careful thought.
Question
Why does the OP care so much that TDD is bad/wrong/invalid? And not just them, for the last 6 months I’ve been responding to critiques of TDD that struck me as shallow, emotional (in the sense of clouding thought), or both. Why the hate?
I’m not saying TDD is the best way to program. I’m saying it tends to be an effective way to address the properties listed above. I’d like to understand its prerequisites, limitations, alternatives, refinements, & pedagogy better. But “it can’t possibly work because of this shallow, specious analogy” doesn’t get us any closer to understanding.
Coda: My Hill Climbing Analogy
The OP missed a hill climbing analogy for TDD—each test is a step uphill. The gradient is measured by “the sets of inputs for which the program operates as desired”. Each test is a click of the ratchet, representing a new set of inputs that will work as desired.
I’ve certainly gotten stuck, in the sense that I had a set of tests that I wanted to pass, I got some of them working, & then I couldn’t figure out how to get the next working without breaking some existing tests. I’ve had to back up & try a different sequences of tests, or a different design, or a different implementation. This would constitute a failure of “TDD as hill climbing”.
Another common theme to these sorts of discussions appears to be, what is the difference between writing the tests first and afterwards?
Working with a code base that has had the tests written afterwards can feel a bit like working with something you’re supposed to be able to take apart, but you can’t because whoever built it used glue to put it together.
I feel just a little less intelligent for having read the referenced blog post - it's neither coherent nor, as you pointed out, logically sound. I'm not even a practitioner of TDD, but I do consider myself a practitioner of logic and reason and the blog post triggers that part of me.
Having said that, I suggest one answer to your "Why the hate?" Perhaps because the hate-spewers have had TDD forced down their throats by some zealot (aka, Big-A Architect, Director of Something-Or-Other, etc), and this is a natural reaction. I can certainly picture _myself_ pushing back and desperately grasping for supposedly rational responses to something that I have been forced to do by someone who may or may not actually have a clue about doing my job. I can even imagine myself writing or head-nodding to a logically flawed response such as the blog post, given that situation.