I’m not talking about zero defects, whatever that means. Just needed to get that out of the way. Can we live without software defects interfering with daily development life? That’s the question.
I posted recently on LinkedIn about teams living without the daily experience of production defects:
As you can see, it stirred up a bunch of reaction, only some of which was because of the brevity of the original post. Here I want to:
Explain myself more thoroughly &
Reflect a bit on the negative half of the reaction.
No Bugs is not Zero Bugs
Here’s the contrast I’m making:
Bugs are a daily fact of life. We track an ever-growing list. We’ll never fix them all. The arrival of high priority defects interferes with feature development. Versus:
Bugs are a rare exception. When one is reported we drop everything to fix it. We also investigate how it happened & fix that. We make sure that this whole class of bugs can never again reach production. We make amends to everyone affected.
Bugs in The Forest are anti-holidays
Bugs in The Forest are anti-holidays—coming up at random intervals & stopping development for their duration.
Desert Bugs
Defects are a daily fact of life in The Desert. Bless your heart if this is where you live. You have backlogs of bugs growing monotonically. Most of those bugs you’ll never fix. More bugs come in daily. Some of those bugs are severe enough to interrupt ongoing development. The bug overhang creates tension between parts of the organization—product, sales, and marketing want features but customer service wants fewer bugs.
In The Desert defects feel like an inevitable part of the landscape, like gravity or weather. Disheartening at times, yes, but whaddayagonnado?
Again, if this is you, good on you for being able to make progress while being tugged hard in multiple directions. You are responding well to a truly difficult set of constraints.
Those constraints are not inevitable.
Forest Bugs
One observation about Extreme Programming teams that shocked me is that many of them have almost no defects reported from production. It shocked me because “no reported defects” was not on my list of goals for XP. If I’d wanted “no reported defects”, I would have done something like Clean Room.
Once teams had systems in production for a year or two, though, many of them reflected that they’d had few or no reports of production defects. Teams embedded in organizations where projects typically had thousands of open defects observed that they received one a month, or one a quarter, or one a year. Or none.
Whu?
This set off a round of industry skepticism. After all, “Use a Bug Tracker” was a widely accepted “best” practice. XP teams weren’t using a bug tracker? Well then they must not be serious.
XP teams responded, “Why do you choose to have bugs?” Not perhaps the most diplomatic answer, but one with more than a hint of truth in it.
Okay, next explanation (since the obvious explanation of “these teams don’t put bugs into production” just can’t be true)—they must be lying. Some executive sent down a zero defect edict so they just stopped reporting.
Nope. With the customer right there with the team, dissatisfaction at that scale would be impossible to hide.
Maybe the teams just aren’t paying attention? Nope. Same reason.
Maybe software was just simpler back then. Some ways yes, some ways no. Forest teams are still reporting “incredibly” low defect counts.
How?
If “no” bugs (I’m just going to start saying “no bugs” & trust you to understand what I mean by “no”) wasn’t a goal of XP, it bears thinking about how it could “just happen”. Especially when everyone in The Desert expects bugs to be like weather.
For example—one team tracked down the six defects reported from production in a year & noticed that all of them had been introduced by someone programming alone. After that they were consistent & persistent about pairing—”Hey, whatcha working on there? I noticed you over in the corner.”
Pairing alone won’t do it, though. Defects are too complex a problem to be addressed by any one practice. There’s something about the mixture of:
Continuous planning, so there’s enough time for implementation, for defect remediation, & for reflection.
Continuous design, so you can prevent recurrences & mistake “proof” code.
Continuous integration, so you step on each other’s toes less frequently.
Continuous deployment, so defects that do get to production are more quickly identified & isolated.
Continuous collaboration (pairing, ensemble), so a) bugs are more likely to be caught immediately & b) understanding of bugs (& safety) quickly percolates through the whole team.
Continuous testing, so you have an immediate double check on implementation decisions.
Customer on the team, so you can just ask instead of feeling pressure to move forward on shaky assumptions. Also, everyone on the team can see the actual face of the person affected by bugs.
Practically all the roots of The Forest work together to create the “bug as anti-holiday” development experience.
Reaction
Why is the possibility of teams living with no bugs so threatening to some folks?
First, this is just what it’s like trying to communicate from The Forest to The Desert. Assumptions in one realm seem absurd in the other.
Second, it’s clear that folks in The Desert are attached to the notion of production defects as just part of development. That feels like a coping mechanism to me—folks feel bad about production bugs so they soothe themselves by assuming they are inevitable.
Last, getting from a Desert perspective on production defects to a Forest perspective is going to be mighty hard work. You’re changing the plane’s engine in the air. But no bugs is not rocket surgery. And it carries benefits for everyone involved.
I have developed a library once that was in production under very high and diverse load in big tech. The library had only one minor bug that was found after a full year in production. Did it take me a long time to write that? No, it was actually pretty fast (maybe 2 months of actual coding). What’s the secret? Religious application of test driven development from start to finish.
In other words, I had lots of bugs, but they all showed up upfront, before production. The strong test coverage also allowed me to do all kinds of refactoring with confidence enabling me to move faster when I discovered structural problems or did performance optimizations based on profiles from prod. All of this to say that TDD is great for reducing defects before they hit production…
Do you know if those XP teams used TDD in addition to pairing?
One other thing I've observed about bugs in the forest is that they often generate a lot of curiosity. When the rare report of a Genuine Production Bug appears, it's common for lots of people to voluntarily down tools and huddle round to see what this interesting, exotic specimen is.