Unit testing is all pretty well but what about delivering these tests into production?
Is it a good idea? Or is it “nice for toy applications, but you can’t do that with Enterprise Applications, because there is to much money/security/personal data at stake”?
And what kind of tests are we talking about anyway?
Basically this post is meant to debunk some believes that I encountered during a couple of Enterprise Application projects. Lets look at them:
- We write throw-away code for unit tests that that we wouldn’t need if we didn’t write unit tests
- [The] tests are only usefull during development – they /must/ be removed from production code
- We don’t need to simulate the connected systems once we get the real systems connected
- Throw-away code is effort that’s useless and could/should be avoided
While these claims seem very convincing at first sight and might be very appealing to managers who try to get a development team to be “more productive” their physical world counterparts tell that story quite differently. And my own – albeit to some extent seemingly counterintuitive – observation is that these believes seem to be quite wrong.
Now – if you’re still with me (or against me – as long as you care about the matter) – let’s debunk them one at a time
(Unit-) Tests require otherwise useless code – I don’t think so!
Of course you need to write some things to automate the tests that would not be necessary if you did something else instead. Like testing the application manually. But that doesn’t mean that you win anything by not writing “those things” in the first place. You’d just replace a checkable, repeatable way to test your program by one that’s dependent on the alertness of the tester (in fact: your alertness!), her or his mood, interruptions etc.
And let’s look at “those things” that you have to write. For me, three things come to mind – the tests themselves, some kind of test harness and some special diagnostic extension point where it is possible for the tests to get access (read and write) to otherwise unexposed information. I’ll come back to the exposition of otherwise unaccessible information in the next section but the other two can be dealt with right away.
The tests themselves? Well, of course I could just test the System manually. But even if I was willing to re-test all possibly affected portions of my system each time I change my code – which I am not – it soon would become economically unsound. Apart from the sheer amount of time spent doing that, for any system that has reached a substantial size I would have to keep records of the test so that I can repeat them easily and know which parts have been retested since the last change and which still need some love. Just writing the tests is way less effort than the combined effort of writing specs, keeping track of the test I just did manually and the manual testing itself.
The test harness? I think this is addressed by the same consideration that hold true for the tests themselves. Plus the fact, that there often is no way to test without a test harness since the system under development relies heavily on surrounding components.
Test should be run only outside live systems – I don’t think so!
So let’s come back to those extra accessors we wrote for our test that are “only there for the tests”. Well sometime they /are/ only there for the tests. Unfortunately. They could be so much more. Let’s look at the real – sorry, physical – world for some comparison:
Airplanes and automobiles.
Have you ever wondered what all those dials on a typical airplane dashboard of the last century indicate? Well a lot of them show things that /should not be necessary to measure/ for example the exhaust gas temperature (egt). This also is a function of the manifold pressure and the fuel intake – both of which are displayed by other gauges. But measuring each of the components separately – which really is redundant if everything works as designed – gives the pilot the option to manually lean or enrich the fuel for optimal performance. And it serves as an indicator of possible related problems.
I still have to see an Enterprise Application where there is e.g. a counter for request, a counter for inserts and a counter for updates, that get monitored in production. Such counters usually are there – for the tests. And they are used – in the tests.
So why not take them to the production system?
Or take another element of pre flight checks (not applicable to jets), the mag-check. Usually airborne engines use a magneto based system for spark generation and two separate circuits with separate magnetos and two spark plugs per cylinder. The “normal operation” is to have both spark plugs running. But – and this is where Enterprise Applications differ hugely – the pilot can control this. Even in mid-flight. And it’s part of the pre-flight check to test those unwanted conditions.
Like “Both” – Revs should be 1400, “Left” – Revs should drop (only one plug), “Both” – Revs should rise again, “Right” – … you get the picture.
But I’ve never seen something like that done to an Enterprise Application:
“Cache Available” => All pages should show in less than n milliseconds
“Cache not Available” => Pages x,y and z should show much slower
“Database available” => Data should be written to the database
“Database not available” => Data should be written to files
But wait – that’s not possible.
Because we removed all the rods and knobs that enable direct access to those internals from the production code.
And more often than not it’s not even possible to take the delivered application an debug it because the debug information – and even the symbol tables where applicable – have been removed. So it’s necessary rebuild the production version including debug information from the source control and try to recreate the failing scenario.
If automobile engines where build like that you would have neither the yellow “general engine failure” light on the dashboard nor the option to go to the garage and have them plug in the diagnostics computer and tell you exactly that there are air bubbles in the fuel pipe every time the engine revs over 5521. (that’s an example of course…)
Instead the garage would have to rebuild an engine exactly like the one you’ve got in your car and analyze the error from that one. Or at least completely disassemble your engine, insert probes, reassemble it and run it in an environment simulating your car.
Although there are numerous implementations where you can probe a running system – especially on Apples OS X – they are mostly aimed at system level functions or the core frameworks. Actually I think it’s very sad that it is so uncommon for programs and especiall Enterprise Applications to reveal relevant information about their internals.
It’s preferable to use The Real (connected) System for development – I don’t think so!
Do you really think that, in the car supply industry, they fit each wiper motor in bodies of all targeted cars before they ship them? Last time I checked they used test rigs that have nothing in common with cars except for the interface to the wiper motor – but they might have some additional features like an instrument that measures the torque of the motor and another one that measures the current drain. And by mounting the wiper motor in that rig and measuring those values they can verify that it actually confirms to the contract to which both the car manufacturer and the supplier designed their part. (Which opens the road to Design by Contract – but that would be another story)
If you write code that get’s deleted later than it’s waste – I don’t think so!
There are lots of examples from the ream world that tell a different story. The simple one of course is the paper cup you get with your coffee to go. It does get ‘deleted’ later on, when you throw it away, but I don’t think it’s feasible to do completely without it. At least I would either burn my fingers or at least ruin my shirt if I tried to take “only the coffee and nothing but the coffee” from my local takeaway. (Replacing the throw-away cup with a mug for multiple usage might be a good idea from an environmental point of view, but replacing it is not the same as “just don’t do it”)
Another examples – that is perhaps more suitable from production point of view – is almost anything that has to do with casting metal. A wide range of casting techniques requires a cast that gets destroyed in the process. Those are made up of a multitude of materials and usually require serious work before the casting process can begin. And after the casting and cooling and hardening it often gets destroyed by brutal force – with hammers and pickaxes if needed. I really don’t think you could build those metal things without the cast. So if I write code that for example simulates a calculation that will be build much later and gets thrown away once the more complex “real” function is implemented that – for me – is the mold that allows me to form the rest of my code around it.
Theres a ton of related material out there, but “PPP” by Uncle Bob and the original edition of “Extreme Programming explained” by Kent Beck do a pretty good job of conveying the fundamental concepts. And throwing in some TDD by Kent Beck, BDD by Dan North plus Refactoring by Martin Fowler et. al. for good measure wouldn’t do much harm either – but that’s on my 2¢ and of course YMMV.
PPP = Agile Software Development, Principles, Patterns, and Practices by
Robert C. Martin
TDD = Test Driven Development: By Example by kent Beck
BDD = http://dannorth.net/introducing-bdd/
Refactoring = Refactoring: Improving the Design of Existing Code by Martin
Fowler, Kent Beck, John Brant and William Opdyke