Monday, October 8, 2007

"Once we're off the map, we don't know where we're going next"

By Michael Bolton.

A reply to a testing enginneer that looked advive on how to persuade developers to fix "small defects"

>I have the following question: what should I tell to the developer(s)when an error occurs with a low severity, occurring in specificsituations with a low probability of happening in real world, when heasks me where or how this error can affect the safety of the system?Every time the developer says that, I can't seem to find a logicalexplanation on how the defect can affect the safety.

The answer that I'd give to this question is that any defect for which wedon't have a good, clear explanation is a potential security threat. Oneway to deal with the problem is to ask the developer, "Can you be sure thatthis problem /doesn't/ represent a threat to the safety of the system? Whatmakes you /sure/? And if you're sure, why don't we handle this condition ina way that makes our certainty evident?"

One thing that tends to make us "sure" that a small symptom isn't a bigproblem is something called "representativeness bias". We have anapparently natural tendency to associated the significance of the symptomand the significance of the underlying problem. In complex systems, thefirst sign of a problem may not be a big sign.

One prominent and memorable example of this is the Challenger disaster.O-ring seals, intended to keep gasses from coming together, degraded.Instead of remaining intact, they burned partially through. NASA managersused the fact that they didn't burn ALL the way through as evidence ofsafety, but (as is now obvious) the opposite was true. As Richard Feynmansaid in his Appendix to the Rogers Commission Report on the Space ShuttleChallenger Accident (I've added emphasis below),

[quote]
There are several references to flights that had gone before. The acceptanceand success of these flights is taken as evidence of safety. /But erosionand blow-by are not what the design expected. They are warnings thatsomething is wrong. The equipment is not operating as expected, andtherefore there is a danger that it can operate with even wider deviationsin this unexpected and not thoroughly understood way./ The fact that thisdanger did not lead to a catastrophe before is no guarantee that it will notthe next time, unless it is completely understood. When playing Russianroulette the fact that the first shot got off safely is little comfort forthe next. The origin and consequences of the erosion and blow-by were notunderstood. They did not occur equally on all flights and all joints;sometimes more, and sometimes less. Why not sometime, when whateverconditions determined it were right, still more leading to catastrophe?
[/quote]

When the system is in an unpredicted state, it's in an unpredictable state.Once we're off the map, we don't know where we're going next.

---Michael B.

(Kaner)

perhaps take a look at the video at http://www.testingeducation.org/BBST/Bugs1.html

and my contribution:

Sometimes it is not easy to give up on a defect after spending several hours trying to figure it out. But sometimes it is the wiser thing to do.

The story usually goes like this:
-We find something that behaves in an unexpected way.
-We try to find out the causes with further testing, perhaps reading documentation, searching the web, talking to the developer.
-Eventually we find out that: it is not noticeable to the user, it does not compromise safety for sure, the cost of fixing clearly does not cover the benefit of having it fixed, or it is just a matter of logical/mathematical consistency.

Sometimes we like to keep everything in the right shelves, even if we’re never going to need them. It kind of gives us “peace of mind”. If we recognize that the reason for wanting the defects fixed is only psychological, it will be easier to let these defects alone.

Anyway, all these defects should be reported formally, and it is not our job or responsibility to discard them. But it is our responsibility not allowing them to be discarded when we have some reason to believe that the consequences might not be completely studied (or more serious than what is acceptable)

Joao Pedro

No comments: