Talk:System accident/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2

Avianca 052

"Flight 052’s crash near New York in 1990 which killed 73 people, important information about the aircraft’s fuel status was passed by the crew to controllers in one facility but this information was lost at the point of hand-off to another. The terminal area controllers then treated the flight like any other when they could have expedited the aircraft’s approach (Roske-Hofstrand and Murphy, 1998)." *[1]

"During the third period of holding, the flightcrew reported that the airplane could not hold longer than 5 minutes, that it was running out of fuel, and that it could not reach its alternate airport, Boston-Logan International. Subsequently, the flightcrew executed a missed approach to John F. Kennedy International Airport." *[2]

"language difficulties . . . the importance of this issue has been well documented in a number of accidents, including the January 1990 Avianca Airlines' accident in Cove Neck, New York" *[3]

"The copilot should have told controllers that there was a fuel emergency. But, he used the wrong words. . . . The NTSB also cited “the lack of standardized, understandable terminology for pilots and controllers for minimum and emergency fuel states” as being contributory to this tragedy." *[4]

The following quote puts the accident in human, personal terms, which is often precisely what we need (and, no, the grammar is not perfect, all the better!). Cool Nerd 20:35, 3 June 2007 (UTC) " . . . that they didn't use the word EMERGENCY so the controllers were in not position to know the severity of the situation. The following is the actual verdict of the invesatigation:- “The failure of the flight crew to adequately manage the airplane's fuel load, and their failure to communicate an emergency fuel situation to air traffic control before fuel exhaustion occurred.” Never had I heard of a plane fell from the sky because of no fuel. Never had I seen such incompetence on the part of the controllers who when changing shifts, never bothered to tell the next guy in charge of the plane being low on fuel . . . when someone say the following, tell me, doesn't that constitute an emergency? "We're running out of fuel", "and ah, we're running out of fuel, sir", "We just running out of fuel.", "Avianca zero five two, we just, ah, lost two engines and, ah, we need priority, please". " [5]

Avianca seems to be a subcategory in which the accident is caused by insistence on formal language. Cool Nerd (talk) 16:25, 16 February 2009 (UTC)

Timeline of Three Mile Island

"00:00:00 Pumps feeding water to the secondary loop shut down.
This was the first of two independent system failures that led to the near meltdown of the Three Mile Island Nuclear Reactor.

00:00:01 Alarm sounds within the TMI control room.
At the time this alarm is disregarded by the operators.

00:00:02 Water pressure and temperature in the reactor core rise.
The failure of the secondary loop pump has stopped the transfer of heat from the Primary Loop to the secondary loop. The rise in temperature and pressure is considered to be part of the normal plant operations, and hence ignored.

00:00:03 Pressure Relief valve (PORV) automatically opens.
When the pressure of steam in the reactor core rises above safe limits, the pressure relief valve is designed to automatically open, releasing the excess steam to a containment tank.

00:00:04 Backup pumps for the secondary loop water system automatically turn on.
Four seconds into the accident the secondary loop water pumps are automatically turned on. This is indicated to the operators by the presence of lights on the control panel. The operators are not aware that the pumps have been disconnected and are not functioning.

00:00:09 Boron and Silver control rods are lowered into the reactor. PORV light goes out, indicating valve is closed.
Lowering of the control rods into the reactor core slows down the rate of the reaction. The effect of which is also a reduction in the heat produced by the reactor. When the PORV light goes out the operators incorrectly assume that the valve is closed. In reality the valve is not only open but is also releasing steam and water from the core. This is now a LOCA (Loss of Coolant Accident) . . . "[6]

Additional Notes on Three Mile Island

" . . . In order to prevent that pressure from becoming excessive, the pilot-operated relief valve (a valve located at the top of the pressurizer) opened. The valve should have closed when the pressure decreased by a certain amount, but it did not . . . " [7]

" . . . The operators believed the relief valve had shut because instruments showed them that a 'close' signal was sent to the valve. However, they did not have an instrument indicating the valve's actual position . . . " [8]

“In the Three Mile Island accident, latent errors were traced back two years. Latent errors can be difficult for the people working in the system to notice since the errors may be hidden in the design of routine processes in computer programs or in the structure or management of the organization. People also become accustomed to design defects and learn to work around them, so they are often not recognized . . . “, To Err is Human: Building a Safer Health System, page 55 [9]

other links on nuclear power

"*Secret Fallout – book by Ernest Sternglass; chapter 18 discusses Three Mile Island"

I am moving this link here from the main page because the system accident approach is actually kind of an anti-conspiracy theory! We're saying, this is just the way institutions work. There is no grand conspiracy. For example, see "Organizations shave uncomfortable truths" several sections down. No one told the PR guy to buff and shave. He did that all on his own. And not just PR employees, employees in general do not want to rock the boat. Standing up to the organization--even for the professed values of the organization--is a risky endeavor. And it's risky even with best of skills and even with good, solid working relationships already in place. Presumably, it can be pulled off but it is less than a hundred percent deal, and that needs to be understood going in. Cool Nerd (talk) 17:33, 1 November 2010 (UTC)

Discussion and Argument about formal vs. everyday language

Hi, thanks for your interest. Let's please work together and create a really good article. I have strong views about the ValuJet accident, perhaps you could leave that to me. Perhaps you also have strong views. Thanks

If you have "strong views" about anything they do not belong on WikipediaDudeman1st 11:38, 3 October 2006 (UTC)

Or, it can make for better writing--as long as I make a sincere effort to be intellectually honest, and I have made such an effort.

Someone please, this is an open offer, write the part on transparency. True, transparency can become as much of a buzz word as anything else, although it's a pretty good buzzword. But, if you pick good examples and you write them in good forward narrative and let it stay in your good human voice, then it can be something more than a buzzword.

What are you talking about? "Your good human voice" and "picking examples" sound POV.Dudeman1st 11:38, 3 October 2006 (UTC)

"Safety" gets a pass.

We allow poor design for "safety" when we probably would not allow it for other aspects of the enterprise.

For example, in the ValuJet crash, engineers may not have allowed canisters which generate oxygen in a high-temperature chemical reaction for something other than labelled "safety." Cool Nerd (talk) 15:23, 15 April 2013 (UTC)

The analogy of customer service

Managers tell employees exactly what to say, and then don't understand why things don't work out. Cool Nerd (talk) 15:26, 15 April 2013 (UTC)

The analogy of poker

Someone buys a book on poker and plays a clumsy, cumbersome manner, and then wonders what he or she has done wrong. Cool Nerd (talk) 15:32, 15 April 2013 (UTC)

Organizations shave uncomfortable truths

Answer this question quickly: Who did the first human heart transplant? Christiaan Barnard of South Africa, well of course! Yes, the first human-to-human, but you may not know that Dr. James Hardy, in the very early morning hours of January 24th, 1964, transplanted a chimpanzee heart into a comatose and dying man. The heart beat for one hour and then the man died.

Dr. Hardy was roundly criticized, not so much for what he did, but for how it was communicated.

The hospital's PR person did not mention that it was a chimpanzee's heart!

Instead we had the following officialese: "The dimensions of the only available donor heart at the time of the patient's collapse proved too small for the requirements of the considerably larger recipient." Huh?

And the interesting thing, no one told the PR man to put it this way.

Dr. James Hardy would have much, much, MUCH rather the whole story be told from the beginning. His was one of four major medical centers racing to do the first transplant, and when the story was splashed across national papers in very misleading form, he lost a considerable degree of respect, including from his colleagues.

Presumably, the PR guy thought he was doing a good job. People readily pick up on what is expected of them in a job, conform to it, and in fact, over-conform to it. And "neutral" language is what institutions generally expect.

The heart recipient was sixty-eight year-old Boyd Rush, and much more seriously than mere PR matters, his sister Mrs. J. H. Thompson, the relative making decisions on his behalf, was not fully informed. She was asked to sign a piece of paper giving permission for: "suitable heart transplant." Wow. This is really not the way things should be.

She deserved better, and her brother deserved better. If the chimpanzee's heart was in fact her brother's best chance of survival, she may have well jumped at the opportunity. Or, it may have taken her a while to get used to the idea. Or, she may have known that this was against deeply held beliefs by her brother. You've got to tell her the truth and give her the chance. That is, a chance to make a good decision. Yes, you have got to have a real conversation with a real human being.

And our institutions seem to have a hard time doing this, including the very real guys and gals working for institutions, when they tend to put on the institutional hat.

And it's probably better to have the conversation(s) early---even an incomplete, non-perfect conversation. Give her a chance to get used to the idea if necessary. And also, importantly, get her involved in the process, don't merely present her with a finished product. Cool Nerd (talk) 14:55, 28 March 2009 (UTC) Cool Nerd (talk) 19:41, 9 August 2010 (UTC) Cool Nerd (talk) 21:17, 9 April 2013 (UTC) (and previous dates, this is a topic I've worked on)

--Every Second Counts: The Race To Transplant The First Human Heart, Donald McRae, Putnam, 2006, pages 123-127.
http://books.google.com/books?ei=qVlgTO2SGoP-8Aaj34y6DQ&ct=result&id=x3s1GmeUr4AC&dq=Donald+McRae+RAce&q=Boyd

see also . . .

Hardy(,) a surgeon who pioneered transplants, Milwaukee Journal Sentinel, Associated Press, 7B (mid page), Feb. 21, 2003.


And so with ValuJet, most probably no one in management told the engineers writing the spec manuals to shave the uncomfortable truth that the oxygen generators burn at high temperatures. Rather, they probably picked up on this on their own.
And of course, looking back on it, we would have much MUCH rather the ValuJet[AirTran] engineers used blunt, rugged, impolite, even and especially non-"neutral" language. Just lay it out there. Don't be 'politic.' The generators burn at excess of 500 degrees Fahrenheit. Cool Nerd (talk) 21:17, 9 April 2013 (UTC)

With ValuJet no one saw the whole

". . . Step 2. The unmarked cardboard boxes, stored for weeks on a parts rack, were taken over to SabreTech's shipping and receiving department and left on the floor in an area assigned to ValuJet property.

Step 3. Continental Airlines, a potential SabreTech customer, was planning an inspection of the facility, so a SabreTech shipping clerk was instructed to clean up the work place. He decided to send the oxygen generators to ValuJet's headquarters in Atlanta and labelled the boxes "aircraft parts". He had shipped ValuJet material to Atlanta before without formal approval. Furthermore, he misunderstood the green tags to indicate "unserviceable" or "out of service" and jumped to the conclusion that the generators were empty. . . ."

—The preceding unsigned comment was added by 204.62.68.23 (talk) 01:45, 12 December 2006 (UTC).

William Langewiesche on ValuJet

“ . . . In this case the organization includes not only ValuJet [now AirTran], the archetype of new-style airlines, but also the contractors that serve it and the government entities that, despite economic deregulation, are expected to oversee it. Taken as a whole, the airline system is complex indeed. . . ” [page 1, roughly a third to half of the way down]


“ . . . Beyond the questions of blame, it requires us to consider that our solutions, by adding to the complexity and obscurity of the airline business, may actually increase the risk of accidents. System-accident thinking does not demand that we accept our fate without a struggle, but it serves as an important caution. . . ” [two paragraphs later]

The Lessons of Valujet 592, William Langewiesche (from The Atlantic, March 1998).

posted by Cool Nerd (talk) 01:28, 20 October 2010 (UTC)

Incomprehensibility

From a review by Richard D. Piccard, Adjunct Associate Professor, for his class Tier III 415 A, "Entropy and Human Activity," offered by Ohio University (Athens, Ohio), Dept. of Physics and Astronomy. Class seemingly offered around Spring Semester 2005 (his site for class last revised March 13, 2005).
http://oak.cats.ohiou.edu/~piccard/entropy/perrow.html

Incomprehensibility . . . Provided that the first few steps' results are consistent, the fact that the mental model was tentative is likely to be forgotten, even if later results contradict it. They become "mysterious" or "incomprehensible" rather than functioning as clues to the falsity of the earlier tentative choice. This is simply the way the human mind works, and systems designed with contrary expectations of their operators are especially vulnerable to system accidents.”

I think a pretty good clear summary by one professor. Cool Nerd (talk) 19:57, 31 October 2010 (UTC)

Categorizing this article

I invite your help to appropriately categorize this article, perhaps in a sub-category of safety or engineering.

Thank you Edison for your good work--Cool Nerd



ICAO

[ICAO stands for International Civil Aviation Organization.]

http://www.aviationwatch.co.uk/

"In sociotechnical system, remedial action based on safety findings goes beyond those who had the last opportunity to prevent the accident, ie the operational personnel, to include the influence of the designer and managers, as well as the structure or architecture of the system. In this approach, the objective is to find what, rather than who, is wrong."

"Another possibility is that the original dual mandate has its echoes further downsptream. The divide between managers and pilots (commerce and safety) is demonstrating a 'split'. Psychologically a split is a process by which a mental structure "loses its integrity and becomes replaced by two or more part-structures." This can be seen in many airlines, where flight crew and cabin crew report to different parts of the organization yet are encouraged to work as a unit inside an aircraft in flight. In 'splitting' "the emotional attitude towards the two part-structures is typically antithetical, one object being experienced as 'Good' …. the other as 'Bad'" 10 If this attitude becomes part of corporate culture it could lead to aircrew being viewed as 'bad' by managers and being scapegoated, probably for reasons that they themselves may not understand."

"Only then can pilots feel free to raise safety concerns and admit where there may be 'failures' and the system can be upgraded accordingly. With the old system of blaming pilots none would speak up for fear of reprisal and latent errors remained in the system until the inevitable constellation of events - fatigue, inclement weather, pushing the limits, would bring the inevitable disaster."

What is a system accident?

I can tell that the contributers to this article must care a lot about these issues and that they have put a lot of hard work into this article. My question is, what exactly is a system accident? I can find no reference to this term aside from this article. Although the article gives several examples of a system accident, I am unable to determine the essential meaning of this term or why it should have a wikipedia article. Is it possible that this article is defining something that had no previous definition? It seems to me that this article constitutes original research. How can this article be made more encyclopedic? Moreover, can it be written to be more NPOV?

I am also wondering if the various examples of a system accident are necessary. Each incident already has an article in wikipedia. Perhaps the qualifying (NPOV, no original research) information from each example should be merged into their respective articles if it is not already there? shotwell 15:47, 1 October 2006 (UTC)

Well, I meant to say that most of the examples already have a wikipedia article. shotwell 15:55, 1 October 2006 (UTC)
I added Perrow's 1984 definition of "system accident" which is used extensively in the nuclear field, in electrical engineering, and in NASA. Complex systems fail in ways which are not apparent in advance, but in retrospect, the accident was bound to happen. There are countless examples. The concept of system accident gets us beyond blaming the last person to touch the system with "operator error" for failures of foresight by those who came before. Individual articles completely miss the point of the problems of tightly coupled systems as analyzed by Perrow.Edison 19:12, 4 October 2006 (UTC)
OK I agree now that this term is defined http://oak.cats.ohiou.edu/~piccard/entropy/perrow.html, but this article needs to be seriously trimmed of orig research.Dudeman1st 12:17, 9 October 2006 (UTC)

complexity --> accident And that is basically it. Charles Perrow also talks about 'tightly-coupled' and 'cascading.' That is, a breakdown in one area can cause another to break down, and once things start to happen, they can happen fast. And other things as well, 'incomprehensibility,' etc. Cool Nerd (talk) 19:27, 19 February 2009 (UTC)

Examples here, or in their own article?

Shotwell, thank you for your comments. I also have wondered about that very point. There is an article on ValuJet. I'm sure there's a good article on Three Mile Island. I'm kind of thinking that I want to look at these accidents through the lense of system accident.

You might also want to look up "Normal Accident," which is a wikipedia entry. System accident has been fairly widely discussed, which is why I'm thinking this is a worthwhile article.

Open offer, especially Three Mile Island, lay out the reasons why it is a system accident, and the reasons why it isn't.

Examples should be constrained to one paragraph

The examples don't need a dossier for each of them. One or two sentences for each of them will suffice. Especially the Value Jet accident fits Perrows definition of a "final accident." If these examples are debatable, they shouldn't be included.Dudeman1st 12:15, 9 October 2006 (UTC)

Chernobyl, and short definition of system accident

http://rpd.oxfordjournals.org/cgi/content/abstract/68/3-4/197 "The Chernobyl accident consisted of a chain of events that were both extremely improbable and difficult to predict. It is not reasonable to put the blame for the disaster on the operators."

My background is business management. I understand systems and feel confident speaking on how oxygen generators came to be so poorly labelled as in the ValuJet crash. In fact, I like my set of skills perhaps more than those of an engineer.

On nuclear power, I do not have the same confidence and would like someone to help on this part.

Article does support claims

I would like more of course, but the factual claims made are supported by the external links and the references.

This article should be deleted

The term "system accident" has no precedent. It isn't in any dictionary or doesn't appear as a unique term on any internet searches. The definition paragraph (A sense of touch is lost) sounds parapsychological and is quite confusing. I feel like the author(s) is using this page for a conspiracy theory website, which belongs somewhere else besides Wikipedia. If this article is intended as a parasychology or metaphysical theory, it should be explicitly stated. If this article doesn't give a more concrete example of the usage of "system accident" or there isn't some consensus about keeping this artcle around, I am going to nominate AfD.Dudeman1st 11:38, 3 October 2006 (UTC)

I agree. shotwell 12:17, 3 October 2006 (UTC)
I removed the sentence which offended two editors. The term of art "system accident" has been used in systems engineering and after-the-fact analysis for over 20 years. It was defined in 1984 by Perrow in the book cited. I added the explicit definition.Edison 19:15, 4 October 2006 (UTC)

This article should not be deleted

System accident has been in currency for some time. Do a google search for ["System accident" ValuJet] or ["System accident" "Space Shuttle"] or ["system accident" "Three Mile Island']. You will find some hits where the words just happen to run together "system, accident" but you will find many direct hits.

As far as the definition of system accident, I give it in the very first paragraph: An accident that results primarily from complexity. And that can be either organizational complexity or technological complexity.

Charles Perrow used the term 'Normal Accident,' but that's more a term you need to argue towards: system so complicated, these type of unforeseen accidents 'normal.' The more recent term, and the clearer term, is system accident. I was AMAZED there was not already a Wiki article on it.

As far as the issue of conspiracy theory, please give me a specific example. To my thinking, I have not claimed a single conspiracy. Because most institutions have groupthink (which they probably need to some extent for group cohesion), that's not a conspiracy. That's just the way institutions normally operate, and something we should be realistic about.

As far as more examples, I invite your help specifically on Three Mile Island.

I'm sorry, but I still believe it should be deleted. I've nominated it for deletion because it appears that the term, "System Accident" is the product of original research. You can offer your opinion at Wikipedia:Articles_for_deletion/System_Accident. You may want to read about the deletion process at Wikipedia:Guide_to_deletion. shotwell 01:13, 4 October 2006 (UTC)
It was defined in a scholarly book in 1984 and is used in industry. It is clearly not original research.Edison 19:16, 4 October 2006 (UTC)

As a matter of interest, the term 'System Accident' is defined in the Handbook of Performability Engineering by Misra, Krishna B., 2008, ISBN: 978-1-84800-130-5. Page 686: Definition of system accident: A system accident or system danger must be defined to represent the abnormal system state explicitly. ... the system accident state generally appears as an abnormal state at the lowest level of system operation or the operating process. --Dionliddell (talk) 08:22, 17 July 2009 (UTC)

Article has no delete consensus, I now tagged it again

OK This article needs a lot of work. It needs editing, cites, encyclopedifying and wikifying. I have tagged this article for cleanup {context} {essay-entry} and {expert verify}. You can read the tags to figure out what needs to be done. IMHO if we have system accident, then normal accident and final accident also need to be created. Cool Nerd are you up to it? Please keep it encyclopedic. I also suggest that the examples be left to a very brief format: "The Value Jet disaster fits the definition of system accident because X, Y and Z." I also think that someone should get input from Charles Perrow and/or William Langewiesche, if possible.Dudeman1st 03:20, 10 October 2006 (UTC)

Normal Accident and System Accident are largely interchangeable terms, with System Accident being somewhat the newer term. As far as "Final Accident," that term, to the best of my knowledge, does not have near the same currency. I myself am just not familiar with that term at all. --Cool Nerd

I'm going to remove the sections about smoke alarms, hairdryers, and grooved highways. They don't currently make any sense and don't appear to be related to "system accidents" in a verifiable way. shotwell 20:35, 11 October 2006 (UTC)


Three Mile island chapter, discussion of terminology

Charles Perrow's book "Normal Accidents" calls Three Mile Island a normal accident [11], which may or may not mean it is a system accident. I really think normal accident needs a page.Dudeman1st 03:20, 10 October 2006 (UTC)

They mean just about the same thing. Perrow seems to like the term Normal Accident, which is really a term he has to argue for, he has to make the case. 'These accidents you think are out of the blue are given, given the operations of these technology, are normal . . .'<--my summary, not his. He spends his entire book making the case. System Accident, to my thinking, is more self-explanatory--an accident that results from the system being so complex. --Cool Nerd

Other References

"The multiple causation theory is an outgrowth of the domino theory, but it postulatesthat for a single accident there may be many contributory factors, causes and sub-causes,and that certain combinations of these give rise to accidents."*[12]

Industrial safety books authored by Trevor A. Kletz; plus High Reliability Organizations (HRO), etc. ". . . Encouraging Minority Opinions: The [US] Naval Reactor Program encourages minority opinions and 'bad news.' Leaders continually emphasize that when no minority opinions are present, the responsibility for a thorough and critical examination falls to management." [13]

". . . have primarily been attributed to `operator error.' However, further investigation has revealed that a good majority of these incidents are caused by a combination of many factors whose roots can be found in the lack of human factors (micro- and macroergonomics) considerations." [14]

More familiar examples of design issues

Smoke detectors and false alarms

"The resident had removed the battery." True, but what is not ackowledged as often is that the crummy little device gave a number of false alarms and the human reacted in a thoroughly predictable way, and the whole thing could have worked out better if the safety device had simply been designed better. The smoke detector not only went off when food was actually burnt, but in very marginal cases such as when a bagel in a toaster is slightly scorched!

The human being is being asked to change how he or she does an activity as normal as cooking. The resident is being asked to disengage the battery at the beginning of cooking and then to re-engage it at the end of cooking. This is two additional steps.

Note from a contributor: My former New England community passed a municipal ordinance requiring rental properties to have wired-in smoke detectors with battery back-up. Fortunately, these smoke detectors will also have a "snooze button." Now, this situation can go either way. A lot depends on whether the snooze feature trusts the humans or attempts to second-guess them. Fifteen minutes strikes this contributor as approximately correct. If the snooze button only silences the alarm for, say, three minutes, you will have the predictable consequence of frustrating the human with another loud alarm when there is in fact no fire, perhaps as the human is sitting down to eat, and another nuissance alarm three minutes later, and another . . . with a predictable likelihood that the human will choose to disengage the device from the wiring. People do not like being passive recipients of a system obviously not working. And since it's now no longer a case of just putting the battery back in, it's even less likely the device will be reconnected.

The theoretical lesson is that a safety system that fights people is probably not the way to go.

In addition, smoke detectors are the only household item I know of that contain radioactive material, usually about one microcurie of Americium. This is not a danger staying on the wall or ceiling, but it might be at disposal, or it might be during manufacture to the employees. And it brings up the interesting conceptual issue, "Safety gets a pass." Meaning, something that's not a good idea, that we would not do in other circumstances, when we are told we should do it for "safety," we are stymied in arguing against it.[citation needed]

Hairdryers with test and reset buttons

In a typical hotel or motel, not only does the plug have a test and reset button, but the wall receptacle also has its own test and reset buttons. The hair dryer does not seem to be working. How will the human predictably react? He or she will probably begin pushing buttons in a trial-and-error fashion. And with four total buttons, the system has 16 possible configurations. Why all this complexity for just a hairdryer! [citation needed]

Polarized plugs as good safety

When these first came out in the early 1980s, often a little phamphlet accompanied the appliance explaining that with the polarized plug, when the appliance is off, it's really off, it's off at the wall. The explanation, although informative, is not necessary for the correct operation. Polarized plugs obviously and transparently work only one way. And when you got it right, you know you've got it right.[citation needed]

Grooved shoulders on highways as good safety

If someone starts to doze off and drifts over the highway, there is a change in both feel and sound. The sound is similar to driving over the grate of a toll bridge.

We are accepting people as they are, and not insisting that they be perfect. And this is also a transparent, obvious safety feature.[citation needed]

Changes after Apollo 1 Fire

“ . . . It took just 18 months and $410 million for 150,000 people working around the clock to incorporate 1,341 approved engineering changes into the spacecraft for the upcoming lunar missions. . . ”

Hamish Lindsay, TRACKING APOLLO TO THE MOON, London, Singapore: Springer-Verlag, 2001, page 159.

Reference cited: “APOLLO EXPEDITIONS TO THE MOON,” Edited by Edgar M. Cortright, 1975, NASA SP-350, NASA, Washington, DC.

Apollo 13

For All Mankind, Harry Hurt III, interviews by Al Reinert, 1988


For All Mankind, Harry Hurt III, interviews by Al Reinert, Atlantic Monthly Press: New York, 1988:

[pages 204-205]
"the Apollo 13 launch was already a month behind schedule . . .

"Following the final prelaunch 'countdown demonstration' at the Cape, the pad techs were supposed to drain the liquid oxygen tanks supplying breathing air to the command module with a series of high-pressure oxygen gas injections. But instead of flushing out LOX, one of the tanks merely recirculated the gas injections back out its drainage pipes.

"Rather than postponing the Apollo 13 launch, the space agency’s top brass assigned a special team of engineers and technicians to correct the mysterious LOX tank malfunction in less than seventy-two hours so the astronauts could lift off on schedule. According to Lowell, 'They went into the history of the tank just to see what the story was, and they found out that it was originally scheduled for Apollo 10, but that it had been dropped at the factory, so it had been recycled, refurbished, and set up for Apollo 13. They looked at the schematics and saw that there was a tube that guides this gaseous oxygen in, and that if the tube was broken or moved away somehow, it would not guide the gases down to force the liquid out, but it would bypass the liquid and just let the gas go out the vent line.

“ 'Well, the engineers all sat around to philosophize on what to do. They could order a new tank, or take one out of another vehicle down the line. But by the time they did all that, several weeks would go by, and they’d have to slip the launch . . . The tank worked perfectly for all the flight aspects--it fit all the systems, it pressurized the spacecraft, it fit the fuel cells, it was good for breathing . . . The only thing that didn’t work was the fact that we couldn’t get the doggone oxygen out of it, which in a normal flight we would never do. In other words, that was something that was strictly a ground test device.'

"Under the pressure of their hasty and ill-conceived official mandate to get Apollo 13 launched on schedule, the engineers then proposed what seemed like an ingenious ad hoc solution to the LOX tank’s drainage problem. 'There was a heater system,' Lovell explains, 'a long tubelike affair with regular wires in it submerged in the liquid oxygen. And they said, "Why don’t we turn on the heater system and boil the oxygen out?" They took a poll, and everyone said, "Gee, that’s a good idea, didn’t think of that." So they turned on the switch for about eight hours, and by gosh, they were absolutely right. All the oxygen boiled out. The tank was absolutely dry. Everything was in good shape. The tank was loaded again a day or so before the launch . . . then we took off.'

[page 206]
". . . an explosion . . . venting . . .


[page 208]
" . . . the LEM as their 'lifeboat' . . .


[pages 216-17]
"A two-month-long official inquiry led by Edgar M. Cortright, head of NASA’s Langley Research Center in Hampton, Virginia. . .

"The heating system made by Beech Aircraft was designed to carry only twenty-eight volts of electrical power rather than the officially specified sixty-five volts. . .

“ 'The thermostatic switch discrepancy was not detected by NASA, North American Rockwell, or Beech . . .


"These changes, which cost an estimate $10-15 million, included the addition of a third oxygen supply tank and the redesign of the electrical wiring in the heater systems. But the net delay incurred for the Apollo program amounted to less than four months."

(Interviewer Al Reinert also produced and directed the film entitled For All Mankind.)

Report of Apollo 13 Review Board ("Cortright Report")

The NASA website has the entire report available.

live link: http://nssdc.gsfc.nasa.gov/planetary/lunar/apollo_13_review_board.txt

dead link: http://history.nasa.gov/ap13rb/ap13index.htm

CHAPTER 5
FINDINGS, DETERMINATIONS, AND RECOMMENDATIONS

.
" . . . It was found that the accident was not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design. [Emphasis added]

.
.
c. "In addition, it is probable that the tank contained a loosely fitting fill tube assembly. [Emphasis added] This assembly was probably displaced during subsequent handling, which included an incident at the prime contractor's plant in which the tank was jarred.
.
.
f. "The special detanking procedures [Emphasis added] at KSC subjected the tank to an extended period of heater operation [Emphasis added] and pressure cycling. These procedures had not been used before, and the tank had not been qualified by test for the conditions experienced. However, the procedures did not violate the specifications which governed the operation of the heaters at KSC.
.
h. "A number of factors contributed to the presence of inadequate thermostatic switches in the heater assembly. The original 1962 specifications from NR to Beech Aircraft Corporation for the tank and heater assembly specified the use of 28 V [Emphasis added] dc power, which is used in the spacecraft. In 1965, NR issued a revised specification which stated that the heaters should use a 65 V [Emphasis added] dc power supply for tank pressurization; this was the power supply used at KSC to reduce pressurization time. Beech ordered switches for the Block II tanks but did not change the switch specifications to be compatible with 65 V dc."
.
.
.
.
.
.

U.S. Congressional Hearings

Senate, April 24, 1970

Committee on Aeronautical and Space Sciences [15].

House, June 16, 1970

Committee on Science and Astronautics [16].

Senate, June 30, 1970

Committee on Aeronautical and Space Sciences [17].

Pages 38 and 39: [Stirring the tank didn't always cause arc and fire.]

"[page 46] . . . a decision was made to try to 'boil off' the remaining oxygen in tank #2 by use of the tank heaters. The heaters were energized with the 66 volt DC GSE power supply and, about 11/2 hours later, the fans were turned on to add more heat and mixing. After 6 hours of heater operation, the quantity had only decreased to 35 percent, and it was decided to attempt a pressure cycling technique. With the heaters and fans still energized, the tank was pressurized to about 300 psi, held for a few minutes, and then vented through the fill line. The first cycle produced a 7 percent quantity decrease, and the process was continued, with the tank emptied after five pressure/vent cycles. The fans and heaters were turned off after 8 hours of heater operation . . . "

WOW!---a whole lot of procedure to drain and refill the stubborn tank preflight. Cool Nerd (talk) 01:16, 12 April 2013 (UTC)

Lost Moon, Jim Lovell & Jeffrey Kluger, 1994.

Lost Moon: The Perilous Voyage of Apollo 13, Jim Lovell & Jeffrey Kluger, Houghton Mifflin Company: Boston, New York, 1994:

"[page 344] . . . The Cortright Commission quickly fell to work, and while none of the men on the panel knew what they would find when they began to look for the cause of the Apollo 13 explosion, they pretty much knew what they wouldn’t find: a single smoking gun. As aviators and test pilots had discovered since the days of cloth and wood biplanes, cataclysmic accidents in any kind of craft are almost never caused by one catastrophic equipment failure; rather, they are inevitably the result of a series of separate, far smaller failures, none of which could do any real harm by themselves, but all of which, taken together, can be more than enough to slap even the most experienced pilot out of the sky. Apollo 13, the panel members guessed, was almost certainly the victim of a such a string of mini-breakdowns. . . "

"[Page 346] . . . Although 28-volt switches in a 65-volt tank would not necessarily be enough to cause damage to a tank—-any more than, say, bad wiring in a house would necessarily cause a fire the very first time a light switch was thrown—-the mistake was still considerable. What was necessary to turn it into a catastrophe were other, equally mundane oversights. . . "

"[page 347] . . . One of the most important milestones in the weeks leading up to an Apollo launch was the exercise known as the countdown demonstration. It was during this hours-long drill that the men in the spacecraft and the men on the ground would first rehearse all of the steps leading up to the actual ignition of the booster on launch day. To make the dress rehearsal as complete as possible, the cryogenic tanks would be fully pressurized, the astronauts would be fully suited, and the cabin would be filled with circulating air at the same pressure used at liftoff. . . "
http://books.google.com/books?id=WJOYlUz6TG0C&printsec=frontcover&dq=Lost+Moon&sig=YbOm9LAeMvZPIA8p9C64y6tVKTc#PPA350,M1

Andrew Chaikin's A Man on the Moon (1994), generally excellent, but . . .

A Man on the Moon: The Voyages of the Apollo Astronauts, Andrew Chaikin, Penguin Books: New York, London, Victoria, 1994.

"[page 292] . . . But first mission control wanted Jack Swigert to stir up the service module's tanks of cryogenic liquid hydrogen and oxygen. In zero g, the super-cold fluids tended to become stratified, making it difficult for the astronauts and controllers in Houston to get accurate quantity readings. To remedy the problem, each tank contained a fan that acted like an egg beater to stir the contents. Strapped into the left-hand couch, Swigert flipped the switches marked H2 FANS and 02 FANS and waited several seconds, then turned the fans off. A moment later there was a loud, dull bang. . . . "

This is incorrect, and the big Hollywood movie with Tom Hanks appears to also be incorrect in this regard. From the June 30, 1970, Senate report, p. 39: "Swigert acknowledged the fan cycle request and data indicate that current was applied to the oxygen tank #2 fan motors at 55:53:20 . . . at 55:54:53.5, telemetry from the spacecraft was lost almost totally for 1.8 seconds." So, it was 93 seconds between Jack hitting the switch and the crew hearing the bang. And even this report gets it slightly wrong, labeling the delay as two-and-a-half minutes! No way. Do the math. It is 93.5 seconds. (See also the timeline on the bottom of page 40.) http://www.gpoaccess.gov/challenger/47_476.pdf
And otherwise, this is really an excellent book on the entire Apollo program. And it's easy for me to sit back and criticize. I'm basically just cherry-picking. I'm not trying to do the entire accident report in real time.
To me, this again brings up the point that it is very difficult for us as human beings to concentrate on both big picture and details at the same time. Very difficult. Cool Nerd (talk) 22:06, 12 April 2013 (UTC)

low-pegged thermometer

Lost Moon: The Perilous Voyage of Apollo 13, Jim Lovell & Jeffrey Kluger, Houghton Mifflin Company: Boston, New York, 1994, pages 349-50.

" . . . Unfortunately, the readout on the instrument panel wasn’t able to climb above 80 degrees. With so little chance that the temperature inside the tank would ever rise that far, and with 80 degrees representing the bottom of the danger zone, the men who designed the instrument panel saw no reason to peg the gauge any higher, designating 80 as its upper limit. What the engineer on duty that night didn’t know—-couldn’t know—-was that with the thermostat fused shut, the temperature inside this particular tank was climbing indeed, up to a kiln-like 1,000 degrees. . . "


Space Exploration for Dummies, Cynthia Phillips, Shana Priwer.

“ . . . The temperature readout thermometer only displayed temperatures up to 100 degrees Fahrenheit (38 degrees Celsius); higher temperatures were mistakenly read at 100 degrees Fahrenheit. . . ”


Lunar Exploration: Human Pioneers and Robotic Surveyors, Paolo Ulivi with David Harland, Springer-Verlag, 2004, page 149.

“An external thermometer, which could have alerted pad engineers, had a scale extending to only 28 C.”


Inviting Disaster: Lessons From The Edge Of Technology, An Inside Look At Catastrophes And Why They Happen (As seen on The History Channel), James Chiles, HarperCollins, 2001, page 188.

"A technician at Kennedy was watching over the improvised detanking setup, and he had a gauge to show the temperature inside the tank. But his thermometer only read up to 80 F because that was as high as the temperature was ever supposed to go."



Apollo 13 Review Board ("Cortright Report")
CHAPTER 5, FINDINGS, DETERMINATIONS, AND RECOMMENDATIONS
http://history.nasa.gov/ap13rb/ch5.pdf
page 5-3

“k. Failure of the thermostatic switches to open could have been detected at KSC if switch operation had been checked by observing heater current readings on the oxygen tank heater control panel. Although it was not recognized at that time, the tank temperature readings indicated that the heaters had reached their temperature limit and switch opening should have been expected.”

Hi, please notice that Space Exploration for Dummies looks to be incorrect. Only by 20 degrees, but so be it. Perhaps the result of simplifying a little too much. But this book does bring up the intriguing point, Is the temperature recorded merely the maximum temperature of which the thermometer is capable?
And notice in the Cortright Report, "by observing heater current readings." Either this is formalese, or things are not quite as simple as just looking at a thermometer. Cool Nerd (talk) 01:38, 12 April 2013 (UTC)

other links

Henry Spencer (bio): " . . . All the emergency-planning emphasis had been on dealing with *foreseeable* problems; very little attention had been given to building versatility into the system so that *unforeseen* difficulties could be handled. One might speculate that this is a 'characteristic error' of organizations that try hard to plan for all possible failures." [18]

ieee spectrum, “Houston, We Have a Solution,” Stephen Cass, April 2005
http://spectrum.ieee.org/aerospace/space-flight/apollo-13-we-have-a-solution
http://spectrum.ieee.org/aerospace/space-flight/apollo-13-we-have-a-solution-part-2
http://spectrum.ieee.org/aerospace/space-flight/apollo-13-we-have-a-solution-part-3

Near system accident during Apollo 15?

Nothing happened, but something could have. Al Worden stayed up in the Command Module. Dave Scott and Jim Irwin went down in the LM to the lunar surface. Apollo 15 landed on the Moon on Friday, July 30, 1971, and stayed on the surface for 2 days, 18 hours, and 54 minutes. The astronauts had three lunar walks. The following is from the third and final walk.

A MAN ON THE MOON: THE VOYAGES OF THE APOLLO ASTRONAUTS, Andrew Chaikin, Penguin Books: New York, London, Victoria, Toronto, Auckland, 1994, pages 438 to 441:

“They had been nearly two hours behind schedule getting to sleep last night and waking up this morning, and that had cost them. The liftoff time, slated for later this day, could not be changed . . .

“It was waiting for them when they reached the ALSEP, sticking out of the ground just as Scott had left it the day before, when he had tried unsuccessfully to extract the deep core sample. Scott had already begun to wonder whether the core was worth the time and effort it was costing . . .

“But Irwin wasn’t ready to give up. He suggested that each of them hook an arm under one of the handles. That helped; they managed to pull the core about one-third of the way out . . .

“The entire core sample was 10 feet long; before they could bring the core home they would have to dismantle it into sections. And for some unknown reason the vise from the Rover’s tool kit refused to work properly . . .

“Scott’s patience was dwindling. Every minute spent on this one sample was time lost for the explorations to come. The trip to the North Complex hung in the balance. 'How many hours do you want to spend on this drill, Joe?' . . .

“Twenty-eight minutes after they began, [Joe] Allen told them to move on. They’d pick it up on the way back to the LM at the end of the traverse . . .

“They had expected a short, easy drive, but instead found themselves pitching over dunelike ridges and troughs . . .

“When Scott and Irwin finally stood at the edge of Hadley Rille, they were rewarded with a sight no one had expected. On the far wall, which was in full sunlight, distinct layers of rock poked through a mantle of dust, like the levels of some ancient civilization. They were surely lava flows. Over many millions of years, perhaps, a succession of outpourings had piled atop one another to build up the valley floor. This was the first--and only--time that Apollo astronauts would find records of the moon's volcanic life, not as fragments scattered around the rim of an impact crater, but in place, preserved from the day they were formed. This was true lunar bedrock. And there was more of it on this side of the rille. Scott and Irwin gathered their tools and went to work.

“There was no sharp dropoff at the rim of Hadley Rille; it was more like the gentle shoulder of a hill, and thankfully, the ground was firm. Effortlessly, Scott continued past the rim and loped several yards down the slope. Even now he could not see the bottom; it was hidden from view beyond the curved flank. He turned and ascended once more. All around him and Irwin were big slabs of tan-colored basalt, shot through with holes from long-vacant gas bubbles; some of the boulders were scored by layers in miniature. But no sooner had Scott hammered off a chip than Joe Allen passed up word that he and Irwin would have to return to the Rover to collect a rake sample, and then it would be time to leave. On another day, Scott might have put up a fight; today he was too tired.

“But in the back room, the geologists decided the rille was worth more time. Urgently, Scott called Irwin to join him a little further down the slope, where masses of darker rock waited. As they set to work, the men heard Joe Allen’s voice once more.

“'Out of sheer curiosity, how far back . . . ' ”


165:43:48 Allen: " . . . from what you would call the edge of the rille are the two of you standing now?"

165:44:02 Scott: "Oh, I don't know...Well, about 50 meters from where I guess we'd say we see real outcrop."

165:44:12 Allen: "Roger, Dave. How far back from the lip of the rille do you think you're probably standing?"

165:44:19 Scott: "Can't tell, I can't see the lip of the rille."

165:44:22 Allen: "Okay. It looks like you are standing on the edge of a precipice on TV; that's why we're asking."

165:44:29 Scott: "Oh, oh, oh, gosh, no, Joe. It slopes right on down here. The same slope. It's just a little inflection here."

Transcript from a NASA website, APOLLO 15, LUNAR SURFACE JOURNAL, HADLEY RILLE. [19]


[Commentary: And this kind of series of events is the type of thing that can cause a person to potentially lose his or her rhythm, even for people as highly skilled, highly trained, and highly motivated as these two astronauts. The key insight of system accident analysis is that it's not about people "not trying," it's about a clunky and cluttered system. Cool Nerd 23:04, 28 June 2007 (UTC)]


There is controversy over how dangerous Hadley Rille actually was, as well as disagreement over how much time was spent there. [20] However, the key question remains, Why are we hurrying down a slope of unknown dimensions in order to stay to a rigid "safe" schedule? Cool Nerd (talk) 18:43, 23 February 2008 (UTC)

Space Shuttle Challenger (rewrite)

First off, Scorpion451, thank you very much for the major rewrite. Even if we disagree on specifics, I'm glad to see someone else taking a substantial interest in this topic. The whole theory and concept of "System Accident," I believe, has a whole lot of explanatory power to offer our technological world.

Here's how I look at Space Shuttle Challenger. The burden of proof got shifted. Instead of the burden of proof being pro-safety, it was pro-status quo. One of the Morton Thiokol guys was famously (infamously!) told to take off his engineer hat and put on his managerial hat. That is, we're going to keep on doing things the way we've been doing them unless there is absolute categorical proof to the contrary. And of course, we're going to only very seldom have proof to that degree. In addition, I suspect there was an attitude, I'll be damned if we're going to be the ones to delay launch! That is, Morton Thiokol as institution did not want to be blamed. I do not know that for a fact, but it would surprise me if that were not the case.

When I talk about a social norm that one cannot question "safety," or that "safety" gets a pass, what I mean is that safety is done clunkily, often in an overboard fashion, and that people resent it. And sometimes it makes other aspects of the overall project less safe. And it certainly interrupts flow, which is where we as human beings are most effective. And we're miles away from an atmosphere of continuous quality improvement, where we can easily and naturally raise safety concerns and deal with them in a proportional way. We instead have this either-or situation: either we raise a safety issue and take it hugely serious in a clunkly, rule-bound fashion, or we completely ignore it.

With Challenger, some of the engineers believed blow-by certainly was an issue and they recommended, as an interim measure, that the Shuttle not be launched if the outside temperature was below the previous lowest temperature, which I believe was (it's been a while, but I did read Richard Feynman's account closely) 59 degrees. That would have been good interim advice. Morton Thiokol should have been less afraid of getting in trouble and should have been more confident in forthrightly addressing an issue. Cool Nerd 00:52, 25 July 2007 (UTC)

I belive we agree more than it may appear, as I agree with your points. On the Columbia, the engineers at NASA considered it a problem, however, the officials ignored repeated requests for damage assessment, spacewalks to examine the impact site, ect. Hence, this was an instance of, as you said, a status quo problem. There had never been a problem with the falling foam before, so officals ignored the problem. From an engineer's point of view, this is not experience being balked by logic, but logic("heavy things falling break things") being balked by experience("It's never hurt anything before!"). So, while I agree that the logical way of thinking can be more tedious and less efficient, it is also safer and more reliable. That's at least the way that I see it, being logically inclined.
The massive cropping, ect, I did on many of the incidents was mainly to highlight the chains of interconnected events that caused each of the specific tragedies, with a "less is more" idea in mind. As the focus of the article should be on the idea of System Accidents, the analysis should be on how each accident is an example of the butterfly effect(possible "see also:" link canidate?) causing many small problems to become a catastrophy. The articles on each of these accidents does an excellent job on the criticizms of the causes, so I think it is sufficient to link to them for those who are interested in the further details and analysis, allowing the article to focus on said chains of events.
I belive that as we both have the same goal, that being improving the article, we can reach a comprimise on what to include on the analysis. I find as an engineering student that a more formal tone lends itself to clearer presentation, hence the removal of many of the dialectic prose sentances. The ideas presented, however, were generally sound, so I made an effort in many cases to preserve the intent in a more formal manner. For instance, while "clunky" is an apt description of the illogical nature of many protocols and red tape nessessary to maintain safty regulations, when conducting deeper analysis it is much more descriptive, especially to someone who may not be as familiar with the subject, to say that the system is "ecsessivly bureaucratic" or "slow to react to unfamiliar situations". I found this page while working on the cybernetics page, and there are many topics under that heading which can also be included here for comparison and expansion.
I look forward to continuing to help improve this page, as I find this topic facinating as well.--scorpion 451 rant 16:05, 25 July 2007 (UTC)

Two types of dangerous attitudes

I have been thinking about it, and I thought of a good way to sum up the two types of problem attitudes that form. The words which came to mind are "complacency" and "rationalization". Complacency refers to the attitude of "Of course the valves were opened after the test. I always open them."(referring to the Three Mile Island disaster.) Rationalization is the type of attitude in which one logically convinces oneself that one is right, reguardless of the facts or without knowing all of the facts, such as in the case of the plane crash where everyone assumed that because the tanks were not labled as hazardous, they were safe to load into the cargo bay of the airplane. This needs some more research and a citation, but it seems to go a long way in reconciling differing opinions. As a physics proffesor once told me: "If a lumberjack or a mathematitian tells you a tree is going to fall, get out of the way."--Scorpion451 rant 18:55, 5 August 2007 (UTC)

Space Shuttle Challenger demoted from article

It currently is a human tragedy, but there's a case to be made that it wasn't a system accident. In particular, it did not have the bolt-from-the-blue aspect. Following is the previous paragraph on Challenger from our article:

"What is called safety can be pursued in a rigid way, with the feeling that more than enough time and effort has already been spent on it, and the consequence that suggestions, of any sort, are likely to be poorly received. Arguably, this was one of the major contributing causes to the Space Shuttle Challenger disaster on January 28, 1986. In that situation, a group of engineers again became worried about the O-rings, specifically that the solid rocket booster O-rings would fail to expand normally and fully seal the joint in the relatively cold temperatures forecast for launch (26 degrees Fahrenheit)[1]. They were ignored in part because NASA officials did not want to further delay the launch, which was already six days behind schedule, and did not want yet another complication (a very human motive), especially on a matter that had seemingly already been resolved. The managers seemed to take the view that the manager's job is to "know the answer," rather than how to find out. In addition, the fact that something comes up yet again is itself information. Ideally, this would have been part of a real conversation. Instead, Challenger was launched and the lives of all seven crew members were lost. The proximate cause was indeed the O-rings. And perhaps the biggest tragedy was that interim solutions were not seriously considered until it was too late and the after-the-fact analysis was being conducted.[2]" [references at end of page]

moved from article to here by Cool Nerd (talk) 00:27, 20 October 2010 (UTC)

not bolt-from-the-blue aspect

The entire sequence of tragic events was foreseen. The key factor seemed to be human frustration about having to revisit an issue that had seemingly already been resolved. A number of factors feed into this, and a number of consequences flowed out of this (resulting at the very end in the fatal accident). Yes, it's an important case study, and important to get a handle on some of the issues involved, but it's probably not a system accident. And some of the remedies are largely the same as if it were a system accident: let's try and have transparency, let's try and do it for real rather than just mindlessly following the 'rules,' let's not have this fiction that it's perfectly safe and let's not engage in this either-or thinking where it's either very dangerous or it's 'as safe as humanly possible' with it being very difficult for the organization to move from one to the other. Instead, let's develop the skills with dealing with medium risks and medium rememdies, with the ability to check things as we go along. Yeah, some of the same remedies, but still probably not a system accident. Cool Nerd (talk) 00:27, 20 October 2010 (UTC)


Do NASA managers have a hundred and twenty things to worry about each flight? That is, perhaps the only "routine" flights are those looked at from outside. If so, that would go a long way toward answering the question: How could such smart people make such a stupid mistake? And this question is precisely why we need "system accident" and similar concepts. Cool Nerd 00:37, 2 November 2007 (UTC) & Cool Nerd (talk) 01:12, 5 February 2009 (UTC) (always trying to make it better!) and other dates as well.


And why didn't employees push the point more? Because you have so much invested emotionally and personally in the job and risking being on the outs feels like too big a risk to take.

The motive to avoid being viewed as ‘not good enough’ is indeed a very powerful human motive, and probably goes back all the way to the first time a person walks into a school building. People are highly motivated to be ‘good enough’ and to avoid blame, and thus mistakes are compounded, unlike a good poker player who might consider folding a hand when the cards and players’ reactions suggest that he or she is beaten. Well, you can’t fold the hand in nuclear safety or airline safey, but you can call a pause, you can reconsider the mental thread that you thought was describing the situation, and perhaps kind of feel your way to another mental thread, and you can definitely call in a colleague and make sure that he or she looks specifically at the points that are bothering you, and at the whole thing. But, you just can’t do this too often. Almost every organization has so much volume of work that if you ask for help just a little too often, you will be negatively labelled.

So, there is a shifting of the burden of proof. You can only stand up against the system when you're sure. And that is not the way safety is supposed to work. Cool Nerd (talk) 01:12, 5 February 2009 (UTC)

Space Shuttle Columbia, 2003, probably not either

The entire sequence of events was foreseen. It truly was a case of “familiarity breeds contempt.” The foam felt like a thoroughly routine thing . . . and it just wasn't.

Now, William Langewiesche directly says that it was a system accident. Charles Perrow more generally says “ . . . To be a system failure, in my definition, requires that even if everyone tries as hard as they can to operate safely, it is in the nature of complex, tightly coupled systems . . . ” (See below section “Perrow says ‘Deepwater Horizon’ not system accident.”)

previously in our main article:

'In a Sept. 2003 interview, William Langewiesche stated:

'“ . . . Yes, I think it would be classified as a system accident. And the sign of that was the unusual breakdown in communication at the end, where Linda Ham never heard the request for visual imagery, and the engineers heard the denial of the request that was not intended for them, and assumed that it was for them... That part—the missing of the two ships in the fog—that's a real sign of a system accident. I mean, with the complexities of the communication paths within NASA, no one could have anticipated that particular failure route. And that magic ability of a failure to bypass safeguards and to find new routes to inflict catastrophe is one of the key characteristics of a complex system accident.
'“Of course you guard against that partly by making your system as simple as possible. And certainly NASA's bureaucracy is not at all a simple system. It's an enormously complicated one. Part of its complexity is that so much of it is not even written, it's just understood. As Hal Gehman (the leader of the CAIB) said to me many times, "These are the unwritten rules." And unwritten rules tend to get really, really, complicated. . . ”[3]
'---
'And of course there were other factors in play such as the normal human frustration at having to revisit an issue already seemingly resolved. And to add to the tragedy, Linda Ham is married to an astronaut (although he was not on this flight) and is very aware of issues of crew safety. There may not be any 'bad guys' involved. System accidents often do not have villains.'

and previously before that:

'The Columbia disaster began long before Columbia even left the ground. The bridge between the shuttle and the dock is constructed of a resin foam, which is extremely strong and dense, yet simple to manipulate and form. A chunk of this foam fell and hit the shuttle's left wing, damaging the heat shield. A damage assessment program, nicknamed 'Crater', predicted damage to the wing, but was dismissed as an overestimation by NASA management. In a risk-management scenario similar to the Challenger disaster, NASA management failed to recognize the relevance of engineering concerns for safety. Two examples of this were failure to honor engineer requests for imaging to inspect possible damage, and failure to respond to engineer requests about status of astronaut inspection of the left wing.[4] NASA's chief thermal protection system (TPS) engineer was concerned about left wing TPS damage and asked NASA management whether an astronaut would visually inspect it. NASA managers never responded.
'NASA managers felt a rescue or repair was impossible, so there was no point in trying to inspect the vehicle for damage while on orbit. However, the Columbia Accident Investigation Board determined either a rescue mission or on-orbit repair, though risky, might have been possible had NASA verified severe damage within five days into the mission.[5][6]'


Atlantic Unbound, Interviews, "The Structure of an Accident"
William Langewiesche interviewed over the phone by Steve Grove, Sept. 26, 2003
http://www.theatlantic.com/past/docs/unbound/interviews/int2003-10-22.htm

Langewiesche's longer piece: http://www.theatlantic.com/magazine/archive/2003/11/columbia-apos-s-last-flight/4204/

http://anon.nasa-global.speedera.net/anon.nasa-global/CAIB/CAIB_lowres_chapter6.pdf

http://www.nasa.gov/columbia/caib/PDFS/VOL2/D13.PDF

see also:
http://www.space.com/news/080201-columbia-legacy.html
http://www.todaysengineer.org/2003/Oct/backscatter.asp

ValuJet 592: Loading of Forward Cargo Hold

From Langewiesche's article, Atlantic Monthly, March 1998: "The cargo stood for another day or two, until May 11, when the SabreTech driver had time to deliver the boxes across the airport to Flight 592. There the ValuJet ramp agent accepted the material, though federal regulations forbade him to, even if the generators were empty, since canisters that have been discharged contain a toxic residue, and ValuJet was not licensed to carry any such officially designated hazardous materials. He discussed the cargo's weight with the copilot, Richard Hazen, who also should have known better. Together they decided to place the load in the forward hold, where ValuJet workers laid one of the big main tires flat, placed the nose tire at the center of it, and stacked the five boxes on top of it around the outer edge, in a loose ring. They leaned the other main tire against a bulkhead. It was an unstable arrangement. No one knows exactly what happened then, but it seems likely that the first oxygen generator ignited during the loading or during taxiing or on takeoff, as the airplane climbed skyward." [21]

Would it have helped just to have tarped it down good? Cool Nerd (talk) 02:34, 14 March 2008 (UTC)

That is, something simple may have made a big difference. Cool Nerd (talk) 02:39, 20 January 2009 (UTC)

Narrative on ValuJet, previously in main article

'Mechanics removed oxygen canisters from three older aircraft and put in new ones. Most of the emphasis was on installing the new canisters correctly, rather than disposing of the old canisters properly. These were simply put into cardboard boxes and left on a warehouse floor for a number of weeks. The canisters were green-tagged to mean serviceable. A shipping clerk was later instructed to get the warehouse in shape for an inspection. (This very human motive to keep up appearances often plays a contributory role in accidents.) The clerk mistakenly took the green tags to mean non-serviceable and further concluded that the canisters were therefore empty. The safety manual was neither helpful for him nor for the mechanics, using only the technical jargon "expired" canisters and "expended" canisters. The five boxes of canisters were categorized as "company material," and along with two large tires and a smaller nose tire, were loaded into the plane's forward cargo hold for a flight on May 11 1996.[7][8] A fire broke out minutes after take-off, causing the plane to crash. All five crew members and 105 passengers were killed.

'If the oxygen generators had been better labeled – that they generate oxygen through a chemical reaction that produces heat, and thus were unsafe to handle roughly, not to mention fly in a cargo hold – the crash might have been averted.

'Many writers on safety have noted that the mechanics did not use plastic safety caps which protect the fragile nozzles on the canisters, and in fact, were not provided with the plastic caps. The mechanics, unaware of the hazardous nature of the canisters, simply cut the lanyards and taped them down. ValuJet was in a sense a Virtual Airline with maintenance contracted out, and the contracting firms often went another level in this process by contracting out themselves. The individual mechanics "inhabited a world of boss-men and sudden firings," obviously not an environment where people can confidently raise concerns.[9] The disaster could also have been prevented by proper disposal or labeling of the tanks.'<--I'm sure the generators had labels, just dripping with formal language that meant hardly anything at all. If, on the other hand, someone had taken a black magic marker and written on the labels, 'These jokers burn at 500 degrees Fahrenheit!!!' that would have gotten people's attention! Cool Nerd (talk) 01:43, 26 January 2009 (UTC)

For article, let's go with direct quote from Brian Stimpson, at least for time being.

ValuJet, Description of Chemical Oxygen Generators

" . . . a lanyard, or slim white cord, connects each mask to a pin that restrains the spring-loaded initiation mechanism (retaining pin). The lanyard and retaining pin are designed such that a one- to four pound pull on the lanyard will remove the pin, which is held in place by a spring-loaded initiation mechanism.

"When the retaining pin is removed, the spring loaded initiation mechanism strikes a percussion cap containing a small explosive charge mounted in the end of the oxygen generator. The percussion cap provides the energy necessary to start a chemical reaction in the generator oxidizer core, which liberates oxygen gas. A protective shipping cap that prevents mechanical activation of the percussion cap is installed on new generators. The shipping cap is removed when the oxygen generator has been installed in the airplane and the final mask drop check has been completed.

"The oxidizer core is sodium chlorate which is mixed with less than five percent barium peroxide and less than one percent potassium perchlorate. The explosives in the percussion cap are a lead styphnate and tetracene mixture.

"The chemical reaction is exothermic, which means that it liberates heat as a byproduct of the reaction. This causes the exterior surface of the oxygen generator to become very hot. The maximum temperature of the exterior surface of the oxygen generator during operation is limited by McDonnell Douglas specification to 547 degrees F., when the generator is operated at an ambient temperature of 70 to 80 degrees F. Manufacturing test data indicate that when operated during tests, maximum shell temperatures typically reach 450 to 500 degrees F. [emphasis added]"---Close-Up: ValuJet Flight 592, from AVweb, originally appeared NTSB REPORTER, Dec. '97[22].

long section on Three Mile Island, previously in main article

Unit 2 at Three Mile Island failed on March 28 1979 due to a combination of mechanical and human errors.

A malfunction of the main feedwater pumps in the secondary, non-nuclear cooling circuit led to their shutdown. As a result, water did not flow through the circuit and the reactor was no longer being cooled. The turbine and the nuclear reactor shut down automatically, as they should have. The loss of cooling from the secondary cooling circuit led a temperature and pressure rise of the primary coolant, which was normal and expected in a shutdown. A pilot-operated relief valve (PORV) opened automatically to relieve the excess pressure on the primary coolant side, and it should have closed automatically when it was finished, because the maintenance of high pressure was important to keep the fluid inside from boiling. However, a faulty sensor told computers and the engineers watching the controls that the PORV was closed, when in fact the PORV was still open.[10] Cooling water began to pour out of the open valve, which quickly caused the reactor core to overheat.

There was no direct way to measure the amount of water in the core. Instead, operators watched the level of core water by looking at the level in the pressurizer. In the closed-circuit loop, the pressurizer is mounted higher than the core, meaning that if there is water in the pressurizer, there is water in the core. Since the PORV was stuck open, the water in the pressurizer was turbulent and the water level indication was not accurate. Control room operators did not realize an accident was imminent because they did not have a direct indication of core water level and because they did not interpret other indications correctly.

Meanwhile, another problem surfaced in the emergency feedwater system. Three backup pumps started immediately after shutdown of the main feedwater pumps, but unfortunately the water lines had been closed for testing 42 hours earlier and had not been reopened. This was either an administrative or human error, but in either case no water was able to enter the circuit. There was also no sensor for the water line flow in the control room, but the pump sensors reported, incorrectly, that they were functioning properly. The human control room operators were provided no information on water level in the reactor, but rather judged by pressure, which was thought to have been stabilized by the malfunctioning valve, which they thought was closed.

The closed water lines were discovered about eight minutes after the backup pumps came on, and once reopened, water flowed normally into the primary cooling circuit. However, steam voids, or areas where no water was present, had formed in the primary circuit, which prevented the transfer of heat from the nuclear reactor to the secondary cooling circuit. The pressure inside the primary coolant continued to decrease, and voids began to form in other parts of the system, including in the reactor. These voids caused redistribution of water in the system, and the pressurizer water level rose while overall water inventory decreased. Because of the voids, the pressurizer water indicator, which displayed the amount of coolant available for heat removal, indicated the water level was rising, not falling. The human operators then turned off the backup water pumps for fear the system was being overfilled. The control room personnel did not know that this indicator could read falsely, so they did not know that the water level was indeed dropping and the core was becoming exposed.

The water that was gushing out of the still-open PORV was collecting in a quench tank. The tank overfilled, and the containment building sump filled and sounded an alarm at 4:11 am. Operators ignored these indicators and the alarm initially. At 4:15 am, the relief diaphragm on the quench tank ruptured, and radioactive coolant began flowing into the containment building.

Three of these four failures had occurred previously with no harm, and the fourth was an extra safety mechanism inserted as the result of a previous incident. However, the coincidental combination of human error and mechanical failure created a loss of coolant accident and a near-disaster.[11]

Possible future improvements in nuclear plants

[23] " . . . The objectives of the Generation IV reactors are passive safety, good economics, proliferation resistance and improved environmental characteristics including reduced waste and better fuel utilisation than current models. Some concepts offer much more efficient use of uranium, including the use of reprocessed fuel, which also reduces the waste disposal task. Some designs involve underground construction for greater security . . . " Cool Nerd (talk) 19:07, 26 February 2009 (UTC)

Bhopal, India, Dec. 2-3, 1984, preventable tragedy, probably not system accident

Charles Perrow

"A couple of years after I published Normal Accidents, we had the Bhopal disaster in India, and I was struck by the fact that with some 40 years of hundreds of chemical plants with the toxic potential of Bhopal, this was the first major disaster. This allowed me to kind of invert Normal Accident Theory and note that it takes just the right combination of a number of circumstances to turn an accident into a catastrophe. As I had noted in my book, one of the most commonplace comments after accidents is 'we are lucky it wasn’t worse.' " (Charles Perrow, 2000, p.5).[11]

Greenpeace

" . . . water (that was being used for washing the lines) entered the tank containing MIC through leaking valves. The refrigeration unit, which should have kept the MIC close to zero degrees centigrade, had been shut off by the company officials to save on electricity bills. The entrance of water in the tank, full of MIC at ambient temperature triggered off an exothermic runaway reaction an consequently the release of the lethal gas mixture. The safety systems, which in any case were not designed for such a runaway situation, were non-functioning and under repair. Lest the neighbourhood community be 'unduly alarmed', the siren in the factory had been switched off. Poison clouds from the Union Carbide factory enveloped an arc of over 20 square kilometres before the residents could run away from its deadly hold."[24]

Deena Murphy-Medley

"1. Temperature and pressure gauges are unreliable.
2. MIC storage tank 610 is exceeding the recommended capacity.
3. The reserve storage tank for excess MIC already contains MIC.
4. The warning system for the community has been shut down.
5. The refrigeration unit that keeps MIC at low temperatures has been shut down.
6. The gas scrubber -- which neutralizes any escaping MIC -- has been shut down.
7. . .
8. . . "[25]

However, at this point, it becomes so blatant that it hardly seems like a system accident anymore. This is the mirror image of the usual system accident, for the usual is how many systems and layers and multi-goals can we add before something untoward happens. This is how many we can take away before something untoward happens. An absolute tragedy. And yeah, a multi-national corporation sure did give short shrift to people in a less developed country. Cool Nerd (talk) 02:38, 27 February 2009 (UTC)

UMass, International Dimensions of Ethics Education in Science and Engineering

Bhopal Plant Disaster, Appendix B: Stakeholders and Level of Responsibility, MJ Peterson, revised March 6, 2008, page 3 of 9 (26 of 86 in PDF file):
" . . . No investigation of what kept water from flowing out drain valve when water flushing was begun on 2 Dec."
http://www.umass.edu/sts/pdfs/Bhopal_Complete.pdf

This is classic for a dysfunctional organization. And sadly, the employees may have perceived the situation all too clearly. That is, if they had said something, even about something as obvious as this, they most likely would have been criticized, belittled, lambasted, and because of this distraction, it would have taken even longer for the "managers" (if we can use that term) to get to around to noticing this themselves and taking it seriously.
Union Carbide had been cutting staff for several years before the disaster. Many organizations seem to believe that if you run lean, you have to also run mean. And that ain't necessarily the case!
You've got to be a builder. You have got to build your people up. Teach them how to have quick meaningful communication, your style. It's okay to teach them your style, just don't completely freeze them out. Your people are your eyes and ears and, at times, if you give them half a chance, your brains. Cool Nerd (talk) 21:56, 27 February 2009 (UTC), and Cool Nerd (talk) 15:50, 11 April 2009 (UTC)

Bhopal Plant Disaster – Situation Summary, MJ Peterson, revised March 20, 2009, Page 5 of 8 (5 of 86 in PDF file):
" . . . one or two of bleeder valves at the bottom of the pipes where wash water should have come out were blocked. The worker doing the washing noticed this, and suspended washing to report the problem. His immediate superior, an operations supervisor rather than a maintenance supervisor, told him to continue."
http://www.umass.edu/sts/pdfs/Bhopal_Complete.pdf

Wow!

Collapse of fourth- and second-floor walkways in Hyatt Regency hotel, Kansas City, July 1981

“This original design, however, was highly impractical because it called for a nut 6.1 meters up the hanger rod and did not use sleeve nuts. The contractor modified this detail to use 2 hanger rods instead of one (as shown in fig-2) and the engineer approved the design change without checking it. This design change doubled the stress exerted on the nut under the fourth floor beam. Now this nut supported the weight of 2 walkways instead of just one (Roddis, 1993).”
Roddis, W.M. (1993). "Structural Failures and Engineering Ethics." Journal of Structural Engineering, May 1993.
http://www.eng.uab.edu/cee/faculty/ndelatte/case_studies_project/Hyatt%20Regency/hyatt.htm#Causes

"January–February 1979: Events and communications between G.C.E. and Havens determine design change from a single to a double hanger rod box beam connection for use at the fourth floor walkways. Telephone calls disputed; however, because of alledged communications between engineer and fabricator, Shop Drawing 30 and Erection Drawing E3 are changed.
"February 1979: G.C.E. receives 42 shop drawings (including Shop Drawing 30 and Erection Drawing E-3) on February 16, and returns them to Havens stamped with engineering review stamp approval on February 26."
http://www.engineering.com/Library/ArticlesPage/tabid/85/articleType/ArticleView/articleId/175/Walkway-Collapse.aspx

"Someone once wrote an excellent paradigm which compared the original design of the Hyatt Regency atrium walkways with the design which was implemented. Suppose a long rope is hanging from a tree, and two people are holding onto the rope -- one near the top and one near the bottom. Under the conditions that (1) each person can hold their own body weight, and (2) that the tree and rope can hold both people, the structure would be stable. However, if one person was to hold onto the rope, and the other person was hanging onto the legs of the first, then the first person’s hands must hold both people’s body weights, and, thus, the grip of the top person would be more likely to fail."
http://www.rose-hulman.edu/Class/ce/HTML/publications/momentold/winter96-97/hyatt.html —Preceding unsigned comment added by Cool Nerd (talkcontribs) 21:57, 29 March 2009 (UTC)

Henry Petroski, To Engineer Is Human

Henry Petroski, To Engineer Is Human: The Role of Failure in Successful Design, New York: Vintage Books (division of Random House), 1982, 1983, 1984, 1985, 1992. See esp. pages 85-93 in his chapter "Accidents Waiting To Happen." (This book was originally published by St. Martin's Press in 1985 in somewhat different form. In addition, some of the material appeared previously in Technology and Culture, Technology Review, and The Washington Post.)

" . . . within four days of the structural failure, the front page of The Kansas City Star carried in lieu of headlines technical drawings of design details that pinpointed the cause. Investigative reporting that would win the Pulitzer Prize indentified the skywalks' weak link and explained how a suspension rod tore through a box beam to initiate the progressive collapse of the weakened structure . . . ", page 86.

" . . . The supports would have been only sixty percent as strong as they should have been according to the Kansas City Building Code. However, since the writers of building codes want designs to be on the safe side and to provide plenty for margin of error not only in assumptions and calculations but also in steelmaking and construction, the codes require much more than minimum strength. Thus the walkways as originally designed, although not as strong as they should have been, would probably not have fallen, and their nonconformity with the code might never have been discovered.
"Unfortunately, someone looking at the original details of the connection must have said he had a better idea or an easier way to hang one skywalk beneath the other and both from the sixty-foot-high ceiling of the atrium, for the connection on the original architectural drawing was difficult, if not impossible, to install. The original concept of a single rod extending from each ceiling connection, passing through holes in a beam supporting the fourth-floor walkway about fifteen feet below and continuing for another thirty feet before passing through the beam of the second-floor walkway, would indeed have been an unwieldly construction task, and someone's suggestion that two shorter rods be used in place of each long one must have had an immediate appeal to anyone involved with the erection of the walkways. . . ", pages 86-87.

" . . . The lobby roof collapsed when the hotel was under construction and checks of many structural details including those of the skywalks were ordered, but apparently the rod and box beam connections were either not checked or not found wanting. After the walkways were up there were reports that construction workers found the elevated shortcuts over the atrium unsteady under heavy wheelbarrows, but the construction traffic was simply rerouted and the designs were apparently still not checked or found wanting . . . ", page 89.

" . . . Another reader suggested a possible way in which the same steel channel sections that constituted the box beam could have been welded in a different configuration to provide a stronger bearing surface, and one can only wonder how many other unpublished suggestions the mailbag of Engineering News-Record held. When I discussed the Hyatt Regency failure in an article in Technology Review, the editor of that magazine also received an unusually large amount of mail on the walkway detail. Sleeve nuts, split dies, and a host of other solutions to the puzzle were proposed by readers who apparently did not read Engineering News-Record and who seemed incredulous that such a detail caused so much trouble.
"But explaining what went wrong with the Hyatt Regency walkways and pointing out changes that would have worked is a lot easier than catching a mistake in a design yet to be realized. After the fact there is a well-defined "puzzle" to solve to show how clever one is. Before the fact one must not only define the design "puzzle" but also verify one's "solution" by checking all possible ways in which it can fail . . . ", page 90.

“ . . . had the structure not been so marginally designed, the other rods might have redistributed the unsupported weight among them, and the walkway might only have sagged a bit at the broken connection. This would have alerted the hotel management to the problem and, had this warning been taken more seriously than the signs of the walkway’s flimsiness during construction, a tragedy might never have occurred. Thus designers often try to build into their structures what are known as alternate load paths to accommodate the rerouted traffic of stress and strain when any one load path becomes available for whatever reason. . . ”, page 92.

“ . . . Within days of the skywalk tragedy a third, remaining elevated walkway that had been suspended alongside the two that collapsed was dismantled and removed from the lobby in the middle of the night despite protests from the mayor of Kansas City. The owners of the hotel argued that the remaining walkway presented a hazard to workers and others in the building, but attorneys for some of the victims objected, claiming that evidence would be destroyed, and engineers interested in studying the cause of the accident lamented the removal of the only extant structure anywhere near equivalent to that which had collapsed. Had the third walkway not been removed so precipitately, its behavior under the feet of dancers might or might not have confirmed the theory that the walkways responded to the tempo of the dance the way a wine glass can to a soprano’s high note . . . ”, pages 92-93.

“ . . . The post-accident analysis of the skywalk connections by the National Bureau of Standards determined that the originally designed connections could only support on the average a load of 18,600 pounds, which was very nearly the portion of the dead weight of the structure itself that would have to be supported by each connection. Hence the factor of safety was essentially 1—which leaves no margin for error and no excess capacity for people walking, running, jumping, or dancing on the walkways . . . ”, page 102.

" . . . Had the hanger rod-box beam suspension detail of the Hyatt Regency walkways not been changed from the original concept, they would not doubt be standing today, the site of many a party and probably unsuspected of being in violation of the building code . . . ", page 204.

I read Petroski as saying that the original design of being 60% to code standards probably would have been good enough, since there is considerable safety margin. But there was a construction change in how the walkways were attached to the suspension poles.
I think we can summarize the key features as follows:
(1) The walkways were underdesigned.
(2) A design change made during construction made this situation worse.
(3) There was an unusual use, or an unanticipated use, of the dancing.
Three lines crossed and there was a catastrope. Maybe if only two had, there would not have been.
And remember, the walkways were up for about a year before they collapsed. So, there may have been a little safety margin, but not much. And for all our clunky, "logical," obsessive focus on communication, when it counted we did not have real communication. Perfectionism is part of the problem. Cool Nerd (talk) 01:35, 9 July 2009 (UTC)

Re-read and modified. Cool Nerd (talk) 23:20, 15 January 2019 (UTC)



Henry Petroski, "When Cracks Become Breakthroughs," Technology Review, 85 (August/September 1982), pp. 18-30. (Probably discusses collapse of walkways.)

Marshall, R. D., et al. Investigation of the Kansas City Hyatt Regency Walkways Collapse. (NBS Building Science Series 143.) Washington D. C.: U. S. Department of Commerce, National Bureau of Standards, 1982.

DC-10 rear cargo door, problem only solved after fatalities?

That seems to be the way organizations generally work. It’s not even anything remotely like a conspiracy. It’s just a general attitude, very widely permeated, of not 'causing' problems. And yes, a person can probably get more done working inside a system rather than being a critic on the outside, but still, things shouldn’t be compromised away till there’s hardly anything left. There needs to be a healthier interchange between theory and practice. And not merely a focus on the internal communication of the organization, of who's on the in's and who's on the outs, who's popular, how things are going to be perceived, how the organization is going to look, and shaving and cutting, etc. And, so . . .

June 1972, outside Windsor, Canada, rear cargo door blows, emergency landing. Some corrective action taken, but not enough.

March 1974, outside Paris, France, rear cargo door blows, rapid depressurization, damage to hydraulics, crash, 346 people die. And then, substantial safety changes are made that should have been made in the first place. Wow. What a loss.

BROKEN LINK: http://www.time.com/time/magazine/article/0,9171,908559,00.html (takes forever to load) 

http://books.google.com/books?id=q4IDRMtRHVsC&pg=PT51&dq=Windsor+%22DC-10%22+door&hl=en&ei=BvaRTvPGCvGDsgK_zY27AQ&sa=X&oi=book_result&ct=result&resnum=3&sqi=2&ved=0CFQQ6AEwAg#v=onepage&q=Windsor%20%22DC-10%22%20door&f=false

posted by Cool Nerd (talk) 19:52, 1 October 2011 (UTC)

System Accidents in general

Dr. Michael A. Greenfield http://www.hq.nasa.gov/office/codeq/safety/archive/syssafe.pdf —Preceding unsigned comment added by Cool Nerd (talkcontribs) 22:36, 3 April 2009 (UTC)


Oregon OSHA Online Course 102 Effective Accident Investigation MODULE 5: DETERMINING SURFACE AND ROOT CAUSES http://gvsafety.com/Documents/SAFETY%20HANDOUTS/Accident%20Investigation/Accident%20Investigation%20Programs%20&%20Presentations/Effective%20Accident%20Investigation%20Module-ORE%20OSHA.pdf

' . . . Systems analysis. At this level we're analyzing the root causes contributing to the accident. We can usually trace surface causes to inadequate safety policies, programs, plans, processes, or procedures. Root causes always preexist surface causes and may function through poor component design to allow, promote, encourage, or even require systems that result in hazardous conditions and unsafe behaviors. This level of investigation is also called "common cause" analysis because we point to a system component that may contribute to common conditions and behaviors throughout the company. . . '

posted by Cool Nerd (talk) 01:55, 30 August 2013 (UTC)

References, primarily for sections previously in main article

  1. ^ "Engineering Ethics: The Space Shuttle Challenger Disaster ". Department of Philosophy and Department of Mechanical Engineering, Texas A&M University, retrieved June 24, 2010.
  2. ^ Rogers Commission (1986-06-06). "Chapter V: The Contributing Cause of The Accident". Report of the Presidential Commission on the Space Shuttle Challenger Accident. Retrieved 2008-03-06. {{cite web}}: Italic or bold markup not allowed in: |work= (help)
  3. ^ Atlantic Unbound, "The Structure of an Accident", telephone interview of William Langewiesche by Steve Grove, Sept. 26, 2003. See also Langewiesche’s longer article “Columbia’s Last Flight”, the Atlantic, Nov. 2003.
  4. ^ Langewiesche, William (2003). "Columbia's Last Flight". The Atlantic Monthly. Retrieved 2008-03-06. {{cite web}}: Unknown parameter |month= ignored (help)
  5. ^ Decision Making at NASA (PDF). Columbia Accident Investigation Board. 2003-08-26. p. 173. Retrieved 2008-03-06. {{cite book}}: |work= ignored (help)
  6. ^ STS-107 In-Flight Options Assessment (PDF). Vol. II. Columbia Accident Investigation Board. 2003-10-28. Retrieved 2008-03-06. {{cite book}}: |work= ignored (help)
  7. ^ "Test shows oxygen canisters sparking intense fire". CNN.com. 1996-11-19. Retrieved 2008-03-06.
  8. ^ Stimpson, Brian (1998). "Operating Highly Complex and Hazardous Technological Systems Without Mistakes: The Wrong Lessons from ValuJet 592" (reprint). Manitoba Professional Engineer. Retrieved 2008-03-06. {{cite web}}: Unknown parameter |month= ignored (help)
  9. ^ Langewiesche, William (1998). "The Lessons of ValuJet 592". The Atlantic Monthly. Retrieved 2008-03-06. {{cite web}}: Unknown parameter |month= ignored (help)
  10. ^ No longer exists--> http://www.uic.com.au/nip48.htm <--Uranium Information Centre no longer exists, but gives you links to the Australian Uranium Association and the World Nuclear Association http://www.world-nuclear.org , and this second one does have information about PORV (pilot-operated relief valve), which seems similar but maybe not quite the same. Checked on Feb. 26, 2009.
  11. ^ a b Perrow, Charles (May 29 2000). "Organizationally Induced Catastrophes" (PDF). Institute for the Study of Society and Environment. University Corporation for Atmospheric Research. Retrieved Feb. 6, 2009. {{cite journal}}: Check date values in: |accessdate= and |date= (help); Cite journal requires |journal= (help)

Perrow says ‘Deepwater Horizon’ not system accident

Is Deepwater Oil Too Risky?
July 19, 2010, David Levy
http://theenergycollective.com/davidlevy/40008/deepwater-oil-too-risky

[“This post is adapted from the preface to the forthcoming paperback edition of Perrow’s 2007 book The Next Catastrophe: Reducing Our Vulnerabilities to Natural, Industrial, and Terrorist Disasters, (Princeton, 2011).”]

“ . . . I do not think that the failure on April 20, 2010 of the rig built by Transocean and run by BP had a system accident (or “normal accident”). While such rigs are very complex and very tightly coupled, it is more likely that faulty executive decisions resulted in knowingly running unnecessary and dangerous risks. To be a system failure, in my definition, requires that even if everyone tries as hard as they can to operate safely, it is in the nature of complex, tightly coupled systems to inevitably (though rarely) have the unforeseeable interaction of failures, usually small ones individually, that can cascade through the system. This was not the case with the Transocean rig; BP management frequently overrode the objections and warnings of its own operators and engineers, and those of its subcontractor, Transocean, and independent consultants. Nothing that transpired was unexpected.

“BP has had a history of ignoring warnings by its own staff in order to cut costs. A refinery explosion in 2005 and a massive oil spill in Prudhoe Bay, Alaska in 2006, resulted in (small) criminal penalties for executive malfeasance; the pipeline had a smaller spill last year, and there are currently strident warnings about the dangers of a massive spill on the pipeline in Alaska. The firm had a close call in 2005 with its deepwater drilling Thunder Horse rig. . . ”

posted by Cool Nerd (talk) 20:26, 5 November 2010 (UTC)

The 'perfect storm' analogy

COMBATING SYSTEM-LEVEL QUALITY PROBLEMS IN COMPLEX PRODUCT DEVELOPMENT, DRAFT FOR DISCUSSION PURPOSES ONLY, Daniel E. Whitney, Massachusetts Institute of Technology, February 2007, page 5:

" . . . The “swiss cheese” theory, which says that a “perfect storm” of things combine in just the wrong way, causing the accident (the holes in different cheese slices line up just right). People often hope that nothing like it will ever happen again. . . "

posted by Cool Nerd (talk) 21:27, 11 November 2011 (UTC)

Astronaut Michael Collins on complexity

Michael Collins, Mission To Mars: An Astronaut's Vision of Our Future in Space, New York: Grove Weidenfeld, 1990, page 291:

" . . . In my experience making things unnecessarily complicated usually detracts from safety. . . "

Collins was the Command Module Pilot for Apollo 11. Cool Nerd (talk) 19:13, 29 December 2011 (UTC)

perhaps Langewiesche's main argument

" . . . part of a larger deception—-the creation of an entire pretend reality that includes unworkable chains of command, unlearnable training programs, unreadable manuals, and the fiction of regulations, checks, and controls. Such pretend realities extend even into the most self-consciously progressive large organizations, with their attempts to formalize informality, to deregulate the workplace, to share profits and responsibilities, to respect the integrity and initiative of the individual. The systems work in principle, and usually in practice as well, but the two may have little to do with each other. Paperwork floats free of the ground and obscures the murky workplaces where, in the confusion of real life, system accidents are born. . . "

William Langewiesche, "The Lessons of ValuJet 592," Atlantic Magazine, March 1998 (second to last paragraph of this the last section). http://www.theatlantic.com/magazine/archive/1998/03/the-lessons-of-valujet-592/6534/4/

posted by Cool Nerd (talk) 19:33, 29 December 2011 (UTC)

issue of whether our article contains unpublished synthesis of previously published material

Charles Perrow directly states that both Apollo 13 and Three Mile Island were 'normal accidents' (this seems to be his preferred phrase, and I just don't remember how often he also uses the phrase 'system accident').

William Langewiesch directly states that the crash of Valujet 592 was a system accident.

It's true that the excellent quote from the REPORT OF APOLLO 13 REVIEW BOARD ("Cortright Report") does not use the phase 'system accident,' but they come about as close as they can: "not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design." (And maybe the phrase 'system accident' was not in currency when they issued their report in 1970.) Cool Nerd (talk) 19:33, 17 April 2012 (UTC)

National Geographic on Valujet (AirTran) crash.

http://www.youtube.com/watch?v=wxTiJReEwfg

See especially starting 34:20, after ruling out other things the crash could have been and was not, investigators begin focusing on the forward cargo hold and then specifically on the high-temp oxygen generators. Cool Nerd (talk) 22:46, 11 May 2012 (UTC)

'cascading' probably should be including along with 'interactive complexity' and 'tight coupling'

GETTING TO CATASTROPHE: CONCENTRATIONS, COMPLEXITY AND COUPLING, Charles Perrow, The Montréal Review, December 2012:

http://www.themontrealreview.com/2009/Normal-Accidents-Living-with-High-Risk-Technologies.php

"A normal accident is where everyone tries very hard to play safe, but unexpected interaction of two or more failures (because of interactive complexity), causes a cascade of failures (because of tight coupling)."

This is a very good brief article by Perrow but it touches on a all kinds of subjects. Ideally, I'd like to have at least one additional source before we include 'cascading' in our article. Cool Nerd (talk) 18:50, 25 March 2015 (UTC)

External links modified

Hello fellow Wikipedians,

I have just added archive links to one external link on System accident. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—cyberbot IITalk to my owner:Online 03:15, 19 February 2016 (UTC)

good summary of system accident / normal accident

This seems to be posted by an adjunct professor at Ohio State:

http://www.ohio.edu/people/piccard/entropy/perrow.html

and seems like a pretty solid summary.

posted by Cool Nerd (talk) 16:16, 20 July 2016 (UTC)

other articles potentially to add

Life and Death at Cirque du Soleil, Vanity Fair, Michael Joseph Gross, May 29, 2015.

' . . . A system accident is one that requires many things to go wrong in a cascade. Change any element of the cascade and the accident may well not occur, but every element shares the blame. . . '

This article, about a fatal accident at Cirque du Soleil, directly gives a definition of system accident. Cool Nerd (talk) 22:49, 4 October 2017 (UTC)

ValuJet(AirTran) 592

This subsection simply makes no sense. It starts at 'Step 2'. And refers to 'unmarked cardboard boxes' without any explanation as to what was in them. The section comes to an abrupt halt without actually telling readers about any accident. Either provide a proper account from start to finish, or omit it entirely. 86.191.147.56 (talk) 02:56, 27 November 2017 (UTC)