TDD offers all these benefits and more. How we practice TDD affects which of the outcomes we get. Which outcomes we value and believe are offered affects how we choose to practice TDD.
One person, thinking that TDD is about delivering tests along with implementation because that protects against future regression, will be happy writing tests after. They will also see value in measuring code coverage and driving it up.
Another person, thinking that TDD is about design, will roll their eyes at code coverage: "my CC is usually high without trying, occasionally low and that's OK, and driving it higher won't help."
Another, who values TDD for the tiny pulse of satisfaction that comes with every new passing test, will focus on the tiniest red/green increments (possibly missing refactoring).
Yet another, who cares about refactoring most of all, wants comprehensive tests and minimal pinned implementation details.
I want all of the benefits and none of the drawbacks. So that shapes how I strive to practice TDD.
That’s sounds great, but how do you actually achieve? Do you practice one at a time or a little bit of each? And how do you assess what needs improvement and which is mastered sufficiently? Feel like it requires separate retro for reviewing TDD mastery, especially on the team level. :)
How do you improve any of your skills? Pair with lots of people. Do katas alone and with your team. Hire a technical coach. Hold a session at an Open Space.
I would like to share one more outcome that I personally enjoy a lot and find extremely important: the FOCUS. TDD helps me to focus and reduce the possibility of "distraction" from what it matters. Not only from the technology perspective, but also from the business/user-needs perspective.
Thanks a lot for sharing all these reflections!! I will definitely use some in the future :-)
100%. It's also a "flow" technique. When you focus on design (both test and production code design), you disconnect your neocortex from the implementation problem, because you're thinking about those other problems. This causes the neocortex to then NOT block the hippocampus from giving you "aha" moments. So, your software development/design starts to flow in ways that were not previously possible.
Thank you for this post, and the most recent book, a great read. For a while I questioned if I really got TDD, as people around me kept saying that it was about design, not testing. Yet the property I utilized the most was testing and the regression suite - I try to write tests in a way that gives me almost total freedom in regard to how I later change the structure of the code. In my case, the most valuable and insightful (also joyful and fearless) design sessions usually happen towards the end of the feature development cycle, as the likelihood of one usecase that invalidates the previous design choices is much lower. So not sure if I can say I test drive the design, or I just test drive the development, so that I can design, or refactor, with a great feedback loop. I gues one could call it keeping-design-choices-open-driven-tests(-driven-development), but as I still follow the test-prod code-refactor cycle with different weights over time, I hope I am still allowed to call it TDD.
I always start applications with a 100% test coverage requirement. The few teams have been on where we had this, things went dramatically better. of course people can have bad testing habits, but I've never seen coverage requirements make that worse. The idea of just allowing lines of code to exist in the code base where you haven't proven they can even execute without error is just bananas to me.
I have. People wasting time writing tests with no assertions, just to get the code coverage up. It's demoralizing to do this kind of make work. The writer knows what's happening. The tests are costs with no benefits.
Wow. The only time I've seen that was that it was unintentional. Every time, pointing it out, was a facepalm-moment for the dev responsible. If I were on a team that wrote non-assertion tests to achieve some arbitrary marker, probably set by some non-engineer, I would leave as fast as I could.
Here is a real-life example, some frameworks like Angular will (by default) generate a test each time you create a component. The default test renders the components and checks that the instance is "truthy".
If teams don't implement or remove the test, and I see that happen, they can end up with an "invalid" code coverage.
agreed. I think of it more as the bare minimum. I want exhaustive test coverage and I want the tests to be good. but there's only one of those that I can programmatically enforce, so I programmatically enforce that one and I still have to leave humans to enforce the part about making the tests be good.
writing bad tests is an orthogonal problem. I don't want to let PRs be merged with missing line coverage just because I'm worried someone might write a test that's only 70% optimal or whatever. I think the benefits here vastly outweigh the costs. so many times I've even caught myself where I had spiked on something thinking I would come back and comment it out and TDD it back into existence. But then I forgot and the test coverage metrics reminded me. it's just a whole different world when you force test coverage. it's bliss.
Fascinating! It feels like misuse/abuse of the tool, just like those TDD critics. Because if we apply TDD more strictly there shouldn’t be any gaps in coverage for the code we implement by hand, because the only reason we add something is the test, right?
There was one line of JUnit not covered, in an exception handler. Do most programs need that level of coverage to be reliable? No. Does strict TDD give you that "for free" (as a side effect)? Yes. Do most programs need strict TDD for every line to be reliable? No. How much does it cost if you do it anyway, compared to taking time to decide whether or not to test this particular change first? Probably close to even.
developer discretion never works. because company incentives at scale inevitably lead to corners being cut. I just had to enable a typescript linter at cruise that would enforce using types correctly instead of just doing things like "any" or "Function", because some developers would repeatedly use those. A couple of the more diligent people on the team would call that out in code review but many other people wouldn't. because the company is breathing down your neck to ship features, and technical debt is a negative externality from the developer's point of view. it's a prisoner's dilemma. The only pattern I've ever seen work well is to enforce these practices with robots.
A consultant I once worked with had the notion (which I think I like) that the tests should align exactly to a work item's acceptance criteria, and low(er than high) test coverage was a smell that the ACs were insufficiently complete.
On the design benefit, TDD also applies pressure (in a good way) to think about coupling. When you want testable code, you really need to think about separating concerns, using techniques like dependency injection, etc. if not it becomes really hard to test.
A great article, as always. As an industry we seem so inclined to have these debates about exactly how many angels can dance on the head of a pin. I first used TDD seriously late in my career, and it was an epiphany in how I thought about software development.
This is perhaps a bit narrow, but I'm very curious about your thoughts in this area: one approach that seems to have worked for me to avoid structure-dependent tests is to focus on "behaviours" or "mini use-cases" for the class.
A lot of people seem to want to focus on testing methods (perhaps because of the very simple introductory examples of TDD?) and my experience is that for any class with internal state that evolves over time, this approach leads to fairly complex set up logic for each individual method test, which very often involves adjusting internal state directly, because the test is trying to fake the missing method calls that would do that directly.
Totally agreed about code coverage! Useful as an indicator, not a goal in itself.
Also, it doesn't tell much about other properties like structure insensitivity, mutation testing neither.
We have to figure out something else.
What about this:
Each test starts with a score of 0 (when the team writes a test, they add a comment on top : "// test score: 0")
While implementing the feature, the score doesn't change but once it's done, we start updating the score according to the following rules:
- "-1" on false positives: the test fails while it shouldn't (e.g. due to a structural change)
- "-10" on flakiness: the test fails "randomly"
- "+5" on true positives: the test fails due to a behavior change that you didn't mean to introduce
(this is an example of rules that one can customize)
From time to time, the teams goes through the tests with the highest and lowest scores to learn from them and maybe rewrite those with the lowest scores.
I am really curious about what you all think about this.
The time spent switching from IDE to browser (or other interface) to manually test then switch back to the IDE and debug is so "exciting" (I want to see if it works! / I have to understand why it doesn't work) that developers don't see time passing and the 1 hour session feels like 5 minutes.
Also, the whole thing seem incompressible while TDD seems compressible (without realizing that it took 10 minutes to implement the feature because they took 5 minutes to write the test first and that without the test, it would have taken 30 minutes + higher maintenance cost which they don't see yet).
You might like what Mike (GeePaw) Hill has to say in his "Lump of Coding Fallacy" video. It's meant to help overcome at least the initial skepticism about TDD that so many have. His blogs and videos are very good, in my opinion.
TDD offers all these benefits and more. How we practice TDD affects which of the outcomes we get. Which outcomes we value and believe are offered affects how we choose to practice TDD.
One person, thinking that TDD is about delivering tests along with implementation because that protects against future regression, will be happy writing tests after. They will also see value in measuring code coverage and driving it up.
Another person, thinking that TDD is about design, will roll their eyes at code coverage: "my CC is usually high without trying, occasionally low and that's OK, and driving it higher won't help."
Another, who values TDD for the tiny pulse of satisfaction that comes with every new passing test, will focus on the tiniest red/green increments (possibly missing refactoring).
Yet another, who cares about refactoring most of all, wants comprehensive tests and minimal pinned implementation details.
I want all of the benefits and none of the drawbacks. So that shapes how I strive to practice TDD.
That’s sounds great, but how do you actually achieve? Do you practice one at a time or a little bit of each? And how do you assess what needs improvement and which is mastered sufficiently? Feel like it requires separate retro for reviewing TDD mastery, especially on the team level. :)
How do you improve any of your skills? Pair with lots of people. Do katas alone and with your team. Hire a technical coach. Hold a session at an Open Space.
Great post.
I would like to share one more outcome that I personally enjoy a lot and find extremely important: the FOCUS. TDD helps me to focus and reduce the possibility of "distraction" from what it matters. Not only from the technology perspective, but also from the business/user-needs perspective.
Thanks a lot for sharing all these reflections!! I will definitely use some in the future :-)
100%. It's also a "flow" technique. When you focus on design (both test and production code design), you disconnect your neocortex from the implementation problem, because you're thinking about those other problems. This causes the neocortex to then NOT block the hippocampus from giving you "aha" moments. So, your software development/design starts to flow in ways that were not previously possible.
https://www.researchgate.net/publication/308937854_An_External_Replication_on_the_Effects_of_Test-driven_Development_Using_a_Multi-site_Blind_Analysis_Approach
How do you decide to act differently based on these studies?
Thank you for this post, and the most recent book, a great read. For a while I questioned if I really got TDD, as people around me kept saying that it was about design, not testing. Yet the property I utilized the most was testing and the regression suite - I try to write tests in a way that gives me almost total freedom in regard to how I later change the structure of the code. In my case, the most valuable and insightful (also joyful and fearless) design sessions usually happen towards the end of the feature development cycle, as the likelihood of one usecase that invalidates the previous design choices is much lower. So not sure if I can say I test drive the design, or I just test drive the development, so that I can design, or refactor, with a great feedback loop. I gues one could call it keeping-design-choices-open-driven-tests(-driven-development), but as I still follow the test-prod code-refactor cycle with different weights over time, I hope I am still allowed to call it TDD.
It’s the workflow that is TDD, not the purpose. Purposes.
I have started to practice TDD. I became addicted, and now I just "can't back" (or Kent Beck?). This is the real problem.
I always start applications with a 100% test coverage requirement. The few teams have been on where we had this, things went dramatically better. of course people can have bad testing habits, but I've never seen coverage requirements make that worse. The idea of just allowing lines of code to exist in the code base where you haven't proven they can even execute without error is just bananas to me.
I have. People wasting time writing tests with no assertions, just to get the code coverage up. It's demoralizing to do this kind of make work. The writer knows what's happening. The tests are costs with no benefits.
Wow. The only time I've seen that was that it was unintentional. Every time, pointing it out, was a facepalm-moment for the dev responsible. If I were on a team that wrote non-assertion tests to achieve some arbitrary marker, probably set by some non-engineer, I would leave as fast as I could.
Exactly!
Here is a real-life example, some frameworks like Angular will (by default) generate a test each time you create a component. The default test renders the components and checks that the instance is "truthy".
If teams don't implement or remove the test, and I see that happen, they can end up with an "invalid" code coverage.
Also, code coverage brings some form of gamification which leads to cheating. 😊
I'll never understand the argument that people can write poor tests, so rather than solve that let's reduce the number of tests they write
What I meant is that code coverage should not be seen as a goal but as an indicator.
agreed. I think of it more as the bare minimum. I want exhaustive test coverage and I want the tests to be good. but there's only one of those that I can programmatically enforce, so I programmatically enforce that one and I still have to leave humans to enforce the part about making the tests be good.
writing bad tests is an orthogonal problem. I don't want to let PRs be merged with missing line coverage just because I'm worried someone might write a test that's only 70% optimal or whatever. I think the benefits here vastly outweigh the costs. so many times I've even caught myself where I had spiked on something thinking I would come back and comment it out and TDD it back into existence. But then I forgot and the test coverage metrics reminded me. it's just a whole different world when you force test coverage. it's bliss.
Fascinating! It feels like misuse/abuse of the tool, just like those TDD critics. Because if we apply TDD more strictly there shouldn’t be any gaps in coverage for the code we implement by hand, because the only reason we add something is the test, right?
There was one line of JUnit not covered, in an exception handler. Do most programs need that level of coverage to be reliable? No. Does strict TDD give you that "for free" (as a side effect)? Yes. Do most programs need strict TDD for every line to be reliable? No. How much does it cost if you do it anyway, compared to taking time to decide whether or not to test this particular change first? Probably close to even.
developer discretion never works. because company incentives at scale inevitably lead to corners being cut. I just had to enable a typescript linter at cruise that would enforce using types correctly instead of just doing things like "any" or "Function", because some developers would repeatedly use those. A couple of the more diligent people on the team would call that out in code review but many other people wouldn't. because the company is breathing down your neck to ship features, and technical debt is a negative externality from the developer's point of view. it's a prisoner's dilemma. The only pattern I've ever seen work well is to enforce these practices with robots.
A consultant I once worked with had the notion (which I think I like) that the tests should align exactly to a work item's acceptance criteria, and low(er than high) test coverage was a smell that the ACs were insufficiently complete.
On the design benefit, TDD also applies pressure (in a good way) to think about coupling. When you want testable code, you really need to think about separating concerns, using techniques like dependency injection, etc. if not it becomes really hard to test.
A great article, as always. As an industry we seem so inclined to have these debates about exactly how many angels can dance on the head of a pin. I first used TDD seriously late in my career, and it was an epiphany in how I thought about software development.
This is perhaps a bit narrow, but I'm very curious about your thoughts in this area: one approach that seems to have worked for me to avoid structure-dependent tests is to focus on "behaviours" or "mini use-cases" for the class.
A lot of people seem to want to focus on testing methods (perhaps because of the very simple introductory examples of TDD?) and my experience is that for any class with internal state that evolves over time, this approach leads to fairly complex set up logic for each individual method test, which very often involves adjusting internal state directly, because the test is trying to fake the missing method calls that would do that directly.
Totally agreed about code coverage! Useful as an indicator, not a goal in itself.
Also, it doesn't tell much about other properties like structure insensitivity, mutation testing neither.
We have to figure out something else.
What about this:
Each test starts with a score of 0 (when the team writes a test, they add a comment on top : "// test score: 0")
While implementing the feature, the score doesn't change but once it's done, we start updating the score according to the following rules:
- "-1" on false positives: the test fails while it shouldn't (e.g. due to a structural change)
- "-10" on flakiness: the test fails "randomly"
- "+5" on true positives: the test fails due to a behavior change that you didn't mean to introduce
(this is an example of rules that one can customize)
From time to time, the teams goes through the tests with the highest and lowest scores to learn from them and maybe rewrite those with the lowest scores.
I am really curious about what you all think about this.
I love this post! Thanks Kent!
Good point concerning the feeling of slower development.
I like to call this the "Development Time Perception Bias" (which I represent like this https://twitter.com/yjaaidi/status/1338503836634320896)
The time spent switching from IDE to browser (or other interface) to manually test then switch back to the IDE and debug is so "exciting" (I want to see if it works! / I have to understand why it doesn't work) that developers don't see time passing and the 1 hour session feels like 5 minutes.
Also, the whole thing seem incompressible while TDD seems compressible (without realizing that it took 10 minutes to implement the feature because they took 5 minutes to write the test first and that without the test, it would have taken 30 minutes + higher maintenance cost which they don't see yet).
I wonder what is your take on this?
I’m more interested in the macro effects, which are clearly positive.
Agreed, what I meant is that most people claiming the "slower development" negative outcome didn't really measure it. It's just a cognitive bias.
That said, I say "most" because in some cases, it's caused by a bad testing strategy.
You might like what Mike (GeePaw) Hill has to say in his "Lump of Coding Fallacy" video. It's meant to help overcome at least the initial skepticism about TDD that so many have. His blogs and videos are very good, in my opinion.
https://www.geepawhill.org/2018/04/14/tdd-the-lump-of-coding-fallacy/
I wonder is there a way to measure quality of your unit tests, something like assertion coverage?
Mutation testing will tell you whether your tests are behavior sensitive
I guess but there is no way to automate it?
There are tools out there. It’s a little hard because it’s a large search space of source code changes.
This one would not be automatic but, delete a random line of code that seems to do something, did any test fail after that?
“My brain just doesn’t work that way.”
If they are being honest with this assertion, they can use TCR instead...
Or they can write code & test it afterward. TCR requires/encourages particular succession & design skills.