What are your thoughts on "property based testing"? That is what i think is the approach that avoids the problem described. Instead of "assert(factorial(3), ..." "assert(factorial(4), ..." and lightly refactoring, we can assert the more fundamental property:
"n=random(); assert(factorial(n), n * factorial(n-1))".
Oops, just hard coded the implementation in the test? But that test is an assertion of _what factorial actually is_. And it cant be "gamed" by repeatedly hard coding cases.
It also leads us quickly important improvements and consideration. n bigger than the twenties means we need to rethink the return type. Is n signed? Do we get the negatives right? Maybe there's a better implementation? (like "round(Gamma(n))", but now do we have a correct implementation of Gamma??)
Im still generally bad at testing, forgive my rubyist mindset. And the property approach is probably worst for small mathematical functions as an example.
The “full implementation test” you propose is more fallible than the code being tested, since it relies on having written the algorithm correctly in the first place but also throws in random, which might behave unpredictably (for example being seeded and always choosing the same number, maybe even always 1).
The test must be simpler than the code it is testing otherwise the test is at least as likely to be wrong as the code.
You even outlined all the ways you would like random to help you test, negatives, bigger numbers. Those are your tests! Write those down explicitly! It’s great to inform your tests from thinking about how you would explode your code.
Tests need to test explicit values, not dynamic ones, otherwise we run the risk of fooling ourselves that our test is doing something it is not.
I think i suggest property based testing more for the "start from zero" case. Factorial is interesting, because it is very sensitive to bounds, so its ripe for testing.
For example, you might choose `fac(n: u16): u128` for types, but i think even that might have bounds problems? Or, do we really want it to always "work as advertised", and choose `fac(n: u64): BigInt`? Do we want to support signed types? Do we even have static types? What is the factorial for negative numbers??
I think the "fuzzing" approach finds those bounds faster, kind of like QuickCheck? (i havent used quickcheck, but ive heard its praises sung) "Given a random input in the domain of the function, assert the behavior..." I dont think though we'd want "any old random u64" (unless that's what we actually want to support calculating).
Indeed, fuzzing is a legitimate testing strategy to find edge cases. Once behavior is established then algorithms that test the bounds of that behavior is very interesting. I think it’s somewhat costly to implement so it should focus on the important parts. Factorial is theoretically unbounded so you’d really want to be intentional about your limitations and test the upper bounds. There’s only theoretically completely factorial functions since memory and time are not infinite.
This criticism of TDD seems like a variant of "TDD doesn't force me to produce correct code; I still have to think. Therefore, TDD is useless." I think this criticism comes from misunderstanding what TDD is supposed to achieve.
When I was at Pivotal, I interviewed a candidate who was super excited about the opportunity to use TDD at work. "By writing tests first, we can prove that our code is correct!" I had to gently explain to them that tests can't actually do that, and then we had a conversation about why we write tests anyway.
Maybe part of the reason people struggle with this idea is that "programming is irreducibly hard" is an uncomfortable truth to confront. Once you start thinking about the epistemology of testing, you realize how little you can actually know.
Sadly, the majority of those who "don't want to do TDD" have little idea of what is involved. There is, for example, a very vocal anti-TDD developer who concludes from the constant imperative to tidy code that it is never finished (true) and therefore you can always put off improving it (false) so that you never actually produce good code.
But the biggest issue seems to be the general reluctance to change, seeing it as an admission that they just weren't good programmers.
I'm familiar with this argument and I don't find it compelling. I work best from concrete to abstract. I also find it a fun intellectual puzzle to find the minimum covering set of examples.
So, for me, working from examples:
* Reduces anxiety
* Encourages incremental progress (related to reducing anxiety)
* Gives me solid code & tests at the end
Coming up with the right properties for property-based testing is just as difficult as coming up with examples.
Though what would you say the minimal set of covering examples for factorial would be (especially without looking at the implementation under test, though I suppose mutation-based testing would work)? It would seem that the process of deciding whether a set of examples is a covering is rather similar to the process of coming up with properties (one difference being that the output of the process of coming up with properties is the test vs. a "we're done").
It is similar in complexity. After all, in both cases we want a filter that will pass acceptable programs & catch unacceptable programs. From an information theory perspective they must contain similar information. An advantage of the examples is that they can be introduced piecemeal, so later examples can incorporate learning from earlier examples.
I might summarize this as "Naive developers will write naive code, TDD or not."
I agree. Same goes for properties.
It also reminds me of one of my favorite theatrical quotes:
"A man talking sense to himself is no more insane than a man talking nonsense not to himself.
Or just as insane."
I have this in a quick note. It has the "3 laws of TDD" plus this below.
If I remember correctly, this comes from some Uncle Bob lecture:
---
"As tests get more specific, the code becomes more generic."
TDD Transformation List (High to Low Priority)
1 - Null
2 - Null to Constant
3 - Constant to Variable
4 - Add Computation
5 - Split Flow (not yet a switch)
6 - Variable to Array
7 - Array to Container (lists, trees, queues...)
8 - If to While
9 - Recurse (recursion, but careful with tail call optimization)
10 - Iterate (for when recursion isn't a solution)
11 - Assign (changing value of variables)
12 - Add Case (switch, else ifs...)
---
Even in this simple factorial example, this shows. And I keep it because I found it helps when I need to find where/how to generalize the code.
So, while there's a lot of practice involved, there should be other "cheat sheets" one can use to "speed run TDD".
Yep, that's the https://en.wikipedia.org/wiki/Transformation_Priority_Premise
What are your thoughts on "property based testing"? That is what i think is the approach that avoids the problem described. Instead of "assert(factorial(3), ..." "assert(factorial(4), ..." and lightly refactoring, we can assert the more fundamental property:
"n=random(); assert(factorial(n), n * factorial(n-1))".
Oops, just hard coded the implementation in the test? But that test is an assertion of _what factorial actually is_. And it cant be "gamed" by repeatedly hard coding cases.
It also leads us quickly important improvements and consideration. n bigger than the twenties means we need to rethink the return type. Is n signed? Do we get the negatives right? Maybe there's a better implementation? (like "round(Gamma(n))", but now do we have a correct implementation of Gamma??)
Im still generally bad at testing, forgive my rubyist mindset. And the property approach is probably worst for small mathematical functions as an example.
The “full implementation test” you propose is more fallible than the code being tested, since it relies on having written the algorithm correctly in the first place but also throws in random, which might behave unpredictably (for example being seeded and always choosing the same number, maybe even always 1).
The test must be simpler than the code it is testing otherwise the test is at least as likely to be wrong as the code.
You even outlined all the ways you would like random to help you test, negatives, bigger numbers. Those are your tests! Write those down explicitly! It’s great to inform your tests from thinking about how you would explode your code.
Tests need to test explicit values, not dynamic ones, otherwise we run the risk of fooling ourselves that our test is doing something it is not.
I totally agree.
I think i suggest property based testing more for the "start from zero" case. Factorial is interesting, because it is very sensitive to bounds, so its ripe for testing.
For example, you might choose `fac(n: u16): u128` for types, but i think even that might have bounds problems? Or, do we really want it to always "work as advertised", and choose `fac(n: u64): BigInt`? Do we want to support signed types? Do we even have static types? What is the factorial for negative numbers??
I think the "fuzzing" approach finds those bounds faster, kind of like QuickCheck? (i havent used quickcheck, but ive heard its praises sung) "Given a random input in the domain of the function, assert the behavior..." I dont think though we'd want "any old random u64" (unless that's what we actually want to support calculating).
Indeed, fuzzing is a legitimate testing strategy to find edge cases. Once behavior is established then algorithms that test the bounds of that behavior is very interesting. I think it’s somewhat costly to implement so it should focus on the important parts. Factorial is theoretically unbounded so you’d really want to be intentional about your limitations and test the upper bounds. There’s only theoretically completely factorial functions since memory and time are not infinite.
This criticism of TDD seems like a variant of "TDD doesn't force me to produce correct code; I still have to think. Therefore, TDD is useless." I think this criticism comes from misunderstanding what TDD is supposed to achieve.
When I was at Pivotal, I interviewed a candidate who was super excited about the opportunity to use TDD at work. "By writing tests first, we can prove that our code is correct!" I had to gently explain to them that tests can't actually do that, and then we had a conversation about why we write tests anyway.
Maybe part of the reason people struggle with this idea is that "programming is irreducibly hard" is an uncomfortable truth to confront. Once you start thinking about the epistemology of testing, you realize how little you can actually know.
it took the (uk) Royal Navy 42 years to adopt citrus to prevent scurvy, i guess youve got about another 20 years to keep beating this drum :)
Kent, you must get tired of people wanting to ‘prove’ TDD is bad.
They don’t seem satisfied with just saying they don’t want to do TDD.
Sadly, the majority of those who "don't want to do TDD" have little idea of what is involved. There is, for example, a very vocal anti-TDD developer who concludes from the constant imperative to tidy code that it is never finished (true) and therefore you can always put off improving it (false) so that you never actually produce good code.
But the biggest issue seems to be the general reluctance to change, seeing it as an admission that they just weren't good programmers.
TDD code is only as good as the tests which are doing the driving. This is a bigger counterexample for not doing example based tests.
If the test was property based ("assert that for arbitrary n > 1, factorial(n) is factorial(n - 1) times n") it actually defines the idea here.
I'm familiar with this argument and I don't find it compelling. I work best from concrete to abstract. I also find it a fun intellectual puzzle to find the minimum covering set of examples.
So, for me, working from examples:
* Reduces anxiety
* Encourages incremental progress (related to reducing anxiety)
* Gives me solid code & tests at the end
Coming up with the right properties for property-based testing is just as difficult as coming up with examples.
If that works for you, great.
Though what would you say the minimal set of covering examples for factorial would be (especially without looking at the implementation under test, though I suppose mutation-based testing would work)? It would seem that the process of deciding whether a set of examples is a covering is rather similar to the process of coming up with properties (one difference being that the output of the process of coming up with properties is the test vs. a "we're done").
It is similar in complexity. After all, in both cases we want a filter that will pass acceptable programs & catch unacceptable programs. From an information theory perspective they must contain similar information. An advantage of the examples is that they can be introduced piecemeal, so later examples can incorporate learning from earlier examples.
TDD is broken because factorial(0) ;)