Composable Tests
It's not what I thought
The Test Desiderata desires 12 properties for tests, two of which are:
Isolation—the result of running one test should be completely independent of the results of other tests.
Composition—??? tests should run together ??? Isn’t that the same thing as isolation?
No, and here’s why (I finally got an example—examples are always the hardest part.)
Isolation
If a test runs by first setting up its own test fixture, creating from scratch all the data it will be using as input, then that test is guaranteed to be isolated. It doesn’t matter what order you run the tests, the results will be exactly the same. (This is the same property as referential transparency in functional programming.)
Isolation is encouraged in the xUnit testing frameworks (at least most of them) by creating a new instance of a test object for every test & running the setUp() function before running the test. (Some frameworks, notably NUnit, reuse test instances, opening the door to breaking isolation.)
Composition
Say we have a suite of isolated tests & we run them all together. The suite’s success should give us confidence (be predictive in Desiderata terms), even though each individual test on its own isn’t comprehensive.
Example—say we have a test:
test1()
object := new Whatever()
actual := object.doSomething()
assertEquals(expected, actual)
We get that working so we want to implement the next bit of functionality. We copy, paste, & extend:
test2()
object := new Whatever()
actual := object.doSomething()
assertEquals(expected, actual)
actual2 := object.nowSomethingElse()
assertEquals(expected2, actual2)
I have seen tests like this that have been copied, pasted, & extended 6 or 7 times. That last test is pretty hard to read.
Notice that test2 can’t pass if test1 fails. All non-compliant programs caught by test1 will also be caught by test2. We have at least 3 options that preserve the same coverage, the same predictability:
Leave both tests.
Delete test1.
Simplify test2.
Pruning
From a purely aesthetic standpoint (& don’t discount aesthetics), leaving both tests as is offends my sensibilities. They are redundant! Something must be wrong.
Deleting test1 loses us another property from the Test Desiderata—tests should be specific. That’s the property of tests where, when one fails, you know exactly where the problem is.
Which leads to my preferred solution—composition. I trim test2 to avoid the purely redundant parts:
test2()
object := new Whatever()
object.doSomething()
actual := object.nowSomethingElse()
assertEquals(expected, actual)
The composition of test1 + test2 hasn’t lost any of the predictive property. It hasn’t lost any of the specific property. In fact the composition may be more specific as it is possible for test1 to fail & test2 to pass (although they may both fail for a common reason).
N x M
Let’s say we have 4 ways of computing interest & 5 ways of reporting that interest. The brute force approach to testing this is 20 tests. Using composition, though, we can achieve the same confidence in our system with 10 tests. If the variants of computing interest are separated, in a functional programming sense, from the variants of reporting, then we need:
4 tests for computation
5 tests for reporting
1 test that combines computing & reporting, to demonstrate that they are wired together
Gaining confidence from composed tests requires some thought, some inference, some design (to make the orthogonal dimensions demonstrably orthogonal), but the investment in writing pays off in making tests:
Faster
More readable
Easier to change
More specific
Less sensitive to structure changes
Critique
When I’ve explained what I mean by composable tests, I often receive shocked reactions from experienced testing-developers. “I would never reduce the assertions in a test.” This seems to me to be a reaction based in fear, not in principle. We worked so hard to get to write tests at all. We can’t make them worse.
Composition isn’t making tests worse. Composition is looking at the tests as a whole, trying to make the whole better as judged by several valuable properties of tests.
Boost your team’s code quality and shipping speed with CodeRabbit—the most advanced AI code review tool built for engineers. CodeRabbit delivers context-aware, line-by-line reviews, instant one-click fixes, and concise PR summaries, integrating right into your GitHub workflow so you spend less time diff diving and more time building.
What Makes CodeRabbit Different?
CodeRabbit provides AI-powered reviews that adapt to your team’s standards, enforcing style, spotting bugs and edge cases, and mapping out code dependencies automatically.
With multi-language support and over 40 linters and static analysis tools, it keeps your code clean, secure, and maintainable—no matter how complex your stack.
Real examples show dramatic impact: SalesRabbit cut bugs by 30% and boosted engineering velocity by 25% simply by adding CodeRabbit to all deploys.
Engineered to help junior and experienced devs alike, CodeRabbit catches issues even seasoned reviewers might miss, and its built-in documentation and reporting keep everyone informed and aligned.
Try CodeRabbit Today
Join thousands of developers who’ve halved code review time and defect rates with CodeRabbit. Start your 14-day free trial and experience seamless AI reviews, actionable feedback, and effortless codebase learning.
Ready to optimize your engineering workflow? (CodeRabbit is free for public repositories, with Pro features available for enterprise teams—start now to transform your code reviews)
Sponsored by CodeRabbit.

I really like this post, in particular the thoughts on N x M.
If a little bit of self-promotion is allowed, I wrote a blog post about a year ago that came to a similar conclusion. I think of it as a way to build robust test suites. In my ideal test suite, only a a single test would fail whenever an error occurred. I also think it could be linked to Separation of Concerns, Don’t Repeat Yourself, or the Single Responsibility Principle.
Feel free to read my thoughts: https://henko.net/blog/verify-only-what-you-need/
Hi Kent,
my worry with the composition above is that if there is a bug in doSomething(), then the state in which we execute doSomethingElse() is unknown. Continuing might be generating errors in doSomethinElse(), which are actually not a bug in doSomethinElse() but the result of executing it in a broken state.
One alternative approach could be to make it possible in the test to explicitly establish the state that doSomething() is expected to bring about. E.g.,
object := new Whatever(readyForDoSomethingElse)
result := object.doSomethingElse()
Yet another alternative would be to explicitly assert on the state that we expect object to be in after doSomething(), eg
test2()
object := new Whatever()
object.doSomething()
assumeTrue(object.isReadyToDoSomethingElse())
actual := object.nowSomethingElse()
assertEquals(expected, actual)
Some test frameworks assign a special state to tests that fail because of a failed assumption, so we might know that test2() did not fail because nowSomethingElse is broken, but because the assumptions we made do not hold