Redundancy and functional failure
Posted June 6th, 2009 by MikeHad alot of fun in the data center this morning. It was sourced in the area of electricity. And to make a long story short, it was the best case of a worst case scenario.
Anyone who spends any time in I.T. (or, I guess…mars landers, aviation, operating rooms, etc…..anything technical or where the failure of a system is ‘bad’) is familiar with redundancy. Redundancy is basically running multiple things so that if one thing fails the other things will still be working.
You can achieve high levels of redundancy these days….and all for lower and lower prices. But it’s no magic bullet. Even in the most redundant environment, failure is always an option….and much more of an option than people think.
There are several problems.
The first is recognition of a failure. When has a failure actually occurred? This is a remarkably difficult problem. It’s easy to know when the power is out. But how do you know when a web server is down? An error message? A long response time? it’s not as easy as you’d think.
Another problem is functional failure. How do you know that the same condition that whacked the first system won’t turn around and whack the second system? You can put two servers on two seperate sides of the globe, and one bug will bring them both down.
Anyway, just a thought after this mornings festivities. Redundancy is good, but it must be understood in proper context.

Leave a Reply