Tuesday, September 21, 2010

The Black Swan


One of my favorite things about working with testers is that they read a wide variety of interesting books - this is one of them. “The Black Swan,” by Nassim Taleb, is one of those books that has generated a considerable buzz within several communities of thought, including testing. I found that I could not resist reading this book as an investor on first pass, so I may need to re-read it as a software tester (perhaps another blog post to follow). Here is my summary of, and reaction to, The Black Swan...

A Black Swan event has three characteristics. It is an outlier, it has extreme impact, and it is later thought to have been predictable or even predicted. "...Rarity, extreme impact, and retrospective (though not prospective) predictability." September 11th, several market crashes, and WWI are examples of large scale Black Swan events. Black Swans may also be smaller in scale, or even personal, such as the beginning or ending of a romantic relationship; and they can also have positive impacts, such as an unexpected inheritance. At the time of this writing, I am 43 years old, and I would agree there have been several Black Swan events during my lifetime, though, unfortunately, no long lost rich uncles.

Taleb’s point is not just that Black Swans are real; it’s that they actually drive the course of history (and the courses of our lives), much more so than ‘normal’ events. Furthermore, many of our current methods of forecasting the future and managing risk are not only ineffective, they actually incubate Black Swans as they exacerbate our exposure to them.

Really? How can this be? Is this guy just saying something sensational to sell books?

Taleb discusses at length institutionalized misunderstandings of the nature of uncertainty, decrying the Gaussian bell curve as the “Great Intellectual Fraud.” One of the problems with the bell curve is the nature of outliers. The bell curve suggests that their rarity practically eliminates the significance of their effect, allowing us to predict with false confidence; while Taleb holds that outliers in some important fields, like finance and history, profoundly affect the nature of all subsequent events.

Another problem with the bell curve is that of regress. In other words, we need more data to better define the shape of the curve, but we assume the shape of the curve before we plot the data. Try to explain this to someone who is not a statistician; they will be asleep before the second wave of your hand.

Silent evidence is a more general problem with our modeling tools. The data that we observe most easily is likely to be produced by winners or survivors of some process. The losers are usually harder to see, but they may be much more numerous, giving us an overly optimistic view.

Taleb does not eschew all mathematical tools, however. He compliments the concept of scalable randomness and the related work of Mandelbrot in the field of fractals. He claims that markets, for example, are better modeled as fractals because of the model’s ability to ‘blow up,’ but that their exponential factors are, alas, still not knowable with any useful level of precision to allow prediction.

“...scalable randomness is unusually counter-intuitive.”

“There is no such thing as a “long run” in practice; what matters is what happens before the long run.”

Taleb also discusses some psychological factors that expose us to Black Swans, for example, confirmation bias. This is our tendency to accept confirmation of our beliefs and ignore contradicting evidence. Narrative fallacy is another such psychological factor. This is our tendency to organize data into stories, to imagine causal links between events, to ‘fill in the blanks.’ This makes it easier for us to remember more information, but the imagined links may be phony.

Information itself is not as valuable as we might think. Given the aforementioned problems with our understanding of uncertainty, and our psychological tendencies, the addition of more information to our situation may only serve to solidify our grip on dangerously flawed models.

Let’s just pretend for a second that I buy all this, what does it mean to me? How should it affect my behavior?

Taleb does offer a piece of semi-concrete investment advice, and that is to use a “Barbell Strategy.” Put ninety percent of your money in very stable investments, and the remaining small fraction in highly speculative vehicles. You gain exposure to positive Black Swans without risking the substantial impact of negative ones. (As of this writing, I am considering, but do not feel compelled by, this suggestion.)

Otherwise, despite Taleb’s appreciation of the pragmatic, and distaste for the theoretical and academic, practical advice was admittedly sparse in this book. However, I think it is safe to say if you try to change your predictive models to account for Black Swans, you’ve missed the point. I imagine Taleb is simply telling us, "DON’T BE THE TURKEY." - STOP PREDICTING. Or, if you must predict, please be aware that you are likely to do so horribly. STOP RELYING ON THE PREDICTIONS OF OTHERS. Or, if you must do so, please protect yourself from the fallout. And finally, BE ROBUST AGAINST DISASTER, AND OPEN TO OPPORTUNITY (whatever that means to you).

Labels:

Monday, September 13, 2010

CAST 2010

Now that I've started the blog, I'm going to reach back into the events of recent history for a few posts. One of those events was CAST 2010, the Conference of the Association for Software Testing.


Overview

CAST is a highly collaborative conference of testing professionals, smaller than the STAR conferences, but densely populated with smart, passionate, vocal people. Among those I met for the first time were Cem Kaner, Harry Robinson, Doug Hoffman, Scott Barber, Matt Heusser, Becky Fiedler, Ben Simo, Tim Coulter, Selena Delesie, Michael Hunter, Cristina Lalley, and Joe Harter. It was also great to renew connections with Michael Bolton, Rob Sabourin, Eric Proegler, Paul Holland, and Michael Bonnar. I know I am leaving somebody out - please feel free to yell at me.

(For you quants, that was 17 people. Conference attendance was about 105. That means that I personally networked with at least 16 percent of the conference attendees. Though the number would be much higher if I had recorded observations more carefully, this is still not bad for an extreme introvert - and a tribute to the nature and quality of this conference.)

CAST is conducted by prominent practitioners, not a corporation or group of vendors. (There are vendors sponsoring the conference, but they do not seem to be the focal point as they are at some other conferences. Actually, I felt sorry for them, at times, due to the lack of attention they received at their booths.) Many of the attendees pay their own way. As such, the level of 'engagement' is very high. The material presented is based on real-world testing. The viewpoints discussed are based on real-world experience. This is not a 'vacation' conference. I came away both energized by new ideas and exhausted from the constant mental stimulation.

My favorite take-away was a set of techniques for large scale testing that I believe will be directly applicable to improving testing in my current context. These techniques were outlined mostly in Harry Robinson's 'Exploratory Test Automation' tutorial and the session on 'Testing Large Scale Scientific Computations: The Short Circuit Method' given by Gaston Gonnet and Monica Wodzislawski, and they were built upon brilliantly in subsequent discussions with several other attendees.


My Presentation – Testability and Technical Skill

Overall, I think my presentation went okay. I rushed a bit, and people were a bit tired since it was the afternoon of the last day. I still have significant room for improvement with my presentation skills, but I walk away from this encouraged to continue improving.

Interestingly, this did not seem to be a controversial topic at all for the audience at this conference. They seem to accept and assume that testers benefit from technical skill. I was hoping to stir up at least a little challenge, but nada. I wonder why it is so controversial in my shop? Is this a localized phenomenon? Does it correlate to the aforementioned level of energy and commitment among the attendees?


Some Highlights
(There were many more, possibly excellent, sessions that I did not attend; these are just some notable points from the sessions I did attend.)

Exploratory Test Automation – Harry Robinson
  • Shared some creative ideas for generating large-scale random inputs for systems.
  • Described two specific approaches
    • Production grammar
    • State modeling
  • More creative ideas for creating light-weight dynamic test oracles
  • Put your machines to work while you are away from the office.
  • You can have crisp handoffs or quality code, but probably not both.
  • We need testers who can design.
  • I was able to spend a significant amount of time talking to Harry after the tutorial and he helped brainstorm ideas about how we can use these techniques to test our product.
Keynote on Estimating – Tim Lister.
  • Covered some common issues with estimating
  • Presented a method for measuring estimates – EQF.
  • Interesting analogy between estimating and hurricane forecasting.
    • I spent some time afterward discussing this analogy with Tim. The hurricane does not provide estimates and the forecasters don’t live in the hurricane. Does this mean we should try to have external parties estimate our projects? Hmmm.
Technical vs. Non-technical Skills in Test Automation – Dorothy Graham

  • Covered some of the basics of test automation skills.
  • Interesting discussion around whether tool independence is a worthy goal. It depends, of course. This is likely to be an issue we are discussing in my shop in the near future.
  • Others generally agreed with my observations that they have seen programmers learn how to test effectively more frequently than they have seen testers learn how to program automation effectively.

Investment Modeling as Exemplar for Exploratory Test Automation – Cem Kaner
  • As an avid amateur investor, this talk was interesting to me, but I never clearly made a connection to exploratory test automation. There was a lot of material in the slides, I need to review it again.
  • A controversial point: “GUI level regression testing is thought to be one of the industry’s worst practices.”
  • There was an interesting point raised by an audience member – we testers need to go to the conferences that our customers are going to, not just constantly talk amongst ourselves.

Testing Large Scale Scientific Computations: The Short Circuit Method – Gaston Gonnet and Monica Wodzislawski
  • This presentation was on a higher technical plane than any of the other talks I attended.
  • How to test complex long running programs with simple inputs and outputs, e.g. weather modeling programs.
  • Testability suggests where faults can hide from testing, and testability does not need an oracle. (That's deep, man.)
  • They enumerated four techniques for creating dynamic oracles.
  • This was a fantastic complement to Harry Robinson’s talk.
So that's a quick tour of CAST 2010 from my perspective. I thought it was a very positive experience, and will most likely try to attend CAST 2011, which will be chaired by Jonathan Bach in Seattle, Washington, sometime in July. Maybe I will see you there.

Labels: , ,