First published July 2017
I was twelve the first time I got paid to work. I was helping renovate a house, out in the burning sun, a wooden-handled scraper/wire brush heavy in my hands.
My first target was an outdoor bannister. I gave it a few swipes, got hot, and went inside to find a cooler job. Dick called me back out. “You need to scrape and brush until all the loose paint chips are gone. Otherwise, we’ll paint it, it’ll look good, but next spring it’ll start flaking and we’ll have to do it again.”
I got back to work, finding, despite the heat and the sweat, satisfaction in doing a job that would last.
Test Coverage
Studying better engineering is based on a collective realization that we (sometimes, more on that later) need to shift our development efforts towards the long term. Quality matters. But how, with a measurement-oriented culture, do you measure quality?
The consequences of code quality are either hard to measure or lag far behind changes of quality:
How easy is it to add features?
How many SEVs did it cause?
How quickly can people learn it?
(Ultimately) How much community does it build?
Test coverage, on the other hand, is easy to measure and measurable today. In the absence of changes to design practices, however, test coverage is at best worthless and at worst a distraction from more important issues.
Production Predictor
The ideal test suite is an instant production predictor. If the tests run, production will be fine if you push. If the tests fail, a production push would fail. The test suite will never be a perfect production predictor, but you can continually make it better.
The problem is that by the time we get to the test suite, it’s already too late because of the Universal Law of Testing:
The number of defects leaving a feedback loop is proportional to the number of defects entering
The code is composed of large elements (functions, classes, files, modules), unnecessarily-shared mutable state, complex control structures, vestigial code, duplicated code, and implicit coupling. You don’t have bugs because you don’t have tests. You have bugs because you write code that makes it easy to have bugs. You have a design problem, not a testing problem.
Prescription
Improve cohesion by reducing the size of elements
Unify duplicated logic
Replace duplicated conditionals with polymorphism
Use immutable data structures where feasible to eliminate aliasing errors
Eliminate dead code
Rationalize naming so the right names are likelier to be guessed
Simplify control flows where possible
Once the system is closer to being composed out of cohesive, loosely coupled elements, then and only then does test coverage begin to correlate with the predictive power of the test suite. As long as changing a line has a significant chance of unexpected consequences, whether that line is covered or not doesn’t really matter.
Priorities
Another way of interpreting efforts to improve engineering is that they recognize that parts of the code base have made this shift to longer half-lives. The cultural challenge is that de-coupling is hard work, provides no immediate payoff, can disconnect from actual progress, and is hard to measure. Test coverage, on the other hand, is easy to measure, easy to make progress on, and easy to justify short-term.
I’m 100% in favor of slowing down to speed up, where we know code will be valuable long-term. I’m in favor of better test coverage. But you have some hard, sweaty, unglamorous work to do first. Otherwise, you risk testing the hell out of error-prone code, only to have to test the hell out of tomorrow’s error-prone code. And the next day’s. And the next.
P.S. One Exception
One scenario where test coverage provides value today is answering the question, “You know this line I just changed? Is it being exercised?” If it isn’t, it’s time to ask, “Am I comfortable with this change first being executed outside of my direct control when it’s in production?” Maybe yes, maybe no, but test coverage eliminates a class of errors where you accidentally didn’t even check.
If a code base is just a tangled ball of string, where every change is dangerous, my favorite testing strategy is to pick out 1 to 3 essential workflows, and write a very high level UI test for each. This may be hard to set up, but done right, it gives you confidence that the core use case(s) works. For a web app, this might be "create an account, go to the starter page, add a new ______, veiw the _____, do something else to the _____, and make sure we see what the user expects."
These kinds of tests treat the system as a black box, and only rely on things users see. So they tend to be at least somewhat stable.
From there, it's possible to start disentangling one module at a time, as needed to add features. And that module can get some unit tests.
One can argue that writing good tests will make the production code more structured and with better architecture design - the egg and the chicken problem...