To Test or Not to Test? That’s a Good Question

Apr 12, 2024

First published August 2014. For context JUnit Max was a commercial product I produced that is still the best test runner I’ve ever used. It gave sub-second feedback every time a file was edited & reported test failures in a way that looked like syntax errors. No more switching back & forth between your coding context & your testing context. Sigh. RIP.

Turns out the eternal verities of software development are neither eternal nor verities. I’m speaking in this case of the role of tests.

Once upon a time tests were seen as someone else’s job (speaking from a programmer’s perspective). Along came XP and said no, tests are everybody’s job, continuously. Then a cult of dogmatism sprang up around testing—if you can conceivably write a test you must.

By insisting that I always write tests I learned that I can test pretty much anything given enough time. I learned that tests can be incredibly valuable technically, psychologically, socially, and economically. However, until recently there was an underlying assumption to my strategy that I wasn’t really clear about.

Software development is often a long game. My favorite software business of all time is the MVS PL/1 compiler. I heard a rumor that at one point it was earning IBM $300M annually with a total staff of 3 developers. To get to such a business you have to be patient, to invest in extending the lifetime of your software for decades if necessary.

It’s that “often” that hid my assumption about testing. Just as golf has a long game and short game requiring related but not identical skills, so software has a long game and a short game [ed: I was edging up to 3X: Explore/Expand/Extract here]. With JUnit Max I am living the short game of software. It’s teaching me the meaning of “related but not identical skills” when applied to software development.

Two Projects

JUnit is a long game–lots of users, stable revenue ($0, alas), bounded scope. We know what JUnit is. We know what attracts and retains users. We just need to stay a bit ahead of slowly evolving needs.

Working on JUnit, the whole bag of XP practices makes sense. We always test-drive development. We refactor whenever we can, sometimes trying 3-4 approaches before hitting one we are willing to live with.

Success in JUnit is defined by keeping the support cost asymptotically close to zero. We have a huge, active user base and no budget for support. The means to success is clear—slow evolution, comprehensive testing, and infrequent releases.

When I started JUnit Max it slowly dawned on me that the rules had changed [ed: see? 3X]. The killer question was (is), “What features will attract paying customers?” By definition this is an unanswered question. If JUnit (or any other free-as-in-beer package) implements a feature, no one will pay for it in Max.

Success in JUnit Max is defined by bootstrap revenue: more paying users, more revenue per users, and/or a higher viral coefficient. Since, per definition, the means to achieve success are unknown, what maximizes the chance for success is trying lots of experiments and incorporating feedback from actual use and adoption.

To Test. Or Not.

One form of feedback I put in place is that all internal errors in Max are reported to a central server. Unlike long game projects, runtime errors in short game projects are not necessarily A Bad Thing (that’s a topic for another post). Errors I don’t know about, however, are definitely A Bad Thing.

Looking through the error log I saw two errors I knew how to fix. I didn’t have any experiments that would fit into the available time, so I set out to fix them both.

The first defect was clear—projects that were closed caused an exception. Writing the test was easy—clone an existing test but close the project before running Max. Sure enough, red bar. A two-line fix later, green bar.

The second defect posed a dilemma. I could see how to fix the problem, but I estimated it would take me several hours to learn what was necessary to write an automated test. My solution: fix it and ship it. No test.

I stand behind both decisions. In both cases I maximized the number of validated experiments I could perform. The test for the first defect prevented regressions, added to my confidence, and supported future development. Not writing the test for the second defect gave me time to try a new feature.

No Easy Answer

When I started Max I didn’t have any automated tests for the first month. I did all of my testing manually. After I got the first few subscribers I went back and wrote tests for the existing functionality. Again, I think this sequence maximized the number of validated experiments I could perform per unit time. With little or no code, no tests let me start faster (the first test I wrote took me almost a week). Once the first bit of code was proved valuable (in the sense that a few of my friends would pay for it), tests let me experiment quickly with that code with confidence.

Whether or not to write automated tests requires balancing a range of factors. Even in Max I write a fair number of tests. If I can think of a cheap way to write a test, I develop every feature acceptance-test-first. Especially if I am not sure how to implement the feature, writing a test gives me good ideas. When working on Max, the question of whether or not to write a test boils down to whether a test helps me validate more experiments per unit time. It does, I write it. If not, damn the torpedoes. I am trying to maximize the chance that I’ll achieve wheels-up revenue for Max. The reasoning around design investment is similarly complicated, but again that’s the topic for a future post.

Some day Max will be a long game project, with a clear scope and sustainable revenue. Maintaining flexibility while simultaneously reducing costs will take over as goals. Days invested in one test will pay off. Until then, I need to remember to play the short game.

Software Design: Tidy First?

Discussion about this post