Wednesday, September 14, 2011

2 Rules for Investigating Production Defects

Production defect investigations can be golden opportunities for testers. Yet I have observed a tendency for these investigations to be less than productive, swinging quickly from blaming the test team to excusing them, and, in turn, excusing practically everyone else as well. I appreciate the underlying empathy, but making excuses is habit forming, and it distracts us from the tasks at hand - learning and improving.

I hate wasting these opportunities, and I don’t want to waste our time. So I have adopted two rules, intended to short-circuit the unproductive blaming and excusing exercise, allowing us to move into the learning...

Rule 1: All production defects could have been caught by a test.

Sometimes the best way to end finger-pointing is for someone to take (at least part of) the (initial) blame. For best effect, someone in a test-leadership capacity should state this rule loudly as soon as the investigation starts. This statement may be met with stunned silence or instant agreement. Either way, we have just opened up some time and space to imagine a better test, a test that would find the problem, and possibly to generalize and explore for tests that may find adjacent problems along multiple dimensions. Of course, these tests may not be practical or economically feasible, but I frequently find that they are both very practical and very affordable. Too many investigations fail to challenge us to find these new tests. These new tests can take the quality of our testing, and the quality of future releases to a “whole ‘nother level.” If we can delay the investigative team’s empathic response, and if, as test leaders, we can broaden our shoulders a bit, we can take advantage of this opportunity almost every time.

Now that we have done that, it’s time to remind our colleagues that the missing tests were merely a superficial aspect of the underlying problem...

Rule 2: All production defects are caused by problems with the requirements, design, code, deployment, operation and/or usage of the system.

In other words, the testing problem is not THE problem. To sing an old refrain, testers neither cause nor fix defects. While Rule 1 gave us a pause in which to learn the testing lesson, Rule 2 reminds the investigators that there’s something more important that needs addressing. These investigations have multiple responsibilities. (Perhaps the term ‘root cause’ has reinforced a bad model in our thinking.) Maybe your stakeholders are satisfied with simply being able to detect the problem should it happen again, but I don’t know many who like leaving money on the table.

Challenge yourself to find a better test, then challenge your teammates to prevent it from finding another problem. The next time you are involved in a production defect investigation, give these 2 rules a try, and let me know how it goes (

Labels: ,