skip to main content

Many company managers, and even some experienced safety professionals still maintain the view that some workplace accidents ”just happen” — that not all workplace incidents are preventable, and that we therefore shouldn’t always expect to find root causes for why they occurred. The reasons behind this attitude vary, with some individuals claiming that such ambitious safety initiatives are overblown, and others maintaining that viewing all accidents as preventable actually keeps us from having realistic attitudes toward workplace risks.

With the National Safety Council (NSC) observing June as “National Safety Month,” it’s a perfect time to take a deep dive into the philosophy behind root cause analysis, and why we should always approach workplace accidents with the expectation of finding the root causes beneath them. In doing so, I also hope to show why root cause analysis plays such an essential and central role in improving safety performance and strengthening our workplace safety culture.

Example of Using Root Cause Analysis Internally: The Hallway Fall Investigation

Let’s start by reviewing an example of a workplace accident from my own experience when I was a Global EHS Coordinator. It happens to be an example that seemed very cut-and-dried at first – one that seemed to have “just happened,” without need for further explanation.

In this incident, an employee with known mobility challenges fell down in an office area, in a hallway leading to the parking lot. She received medical treatment and missed several days of work, resulting in an OSHA recordable injury. We hadn’t received reports of any other employees falling in that area, so it seemed this all simply happened because of the employee’s mobility issue, not because of any inherent risks in the workplace. Accidents just happen, right?

Not so fast…

I conducted the investigation of this incident, and must admit, I initially had doubts that I’d uncover anything I didn’t already know — but I did. First, I learned that other employees had fallen in the same area, but just hadn’t reported their falls because they’d been able to get up and walk away. I also noticed that late in the afternoon, there was a slippery, waxy coating on the hallway floor that had not been present earlier in the day.

The Root Cause Isn’t Always Obvious

I then investigated the timeline of all of the falls, and found that they began not long after a new facilities management worker began cleaning that section of hallway. The facility maintenance supervisor had not provided this worker with sufficient training on hall washing protocols, and because this hallway opened to the parking lot, company management allowed him to wash this hallway last and then go home directly after. Because of this combination of factors, this particular stretch of hallway did not dry as well as others in the facility, which caused the slippery texture I’d noticed late in the day.

When I continued the investigation, I also discovered that office employees had not received the same training in incident and near-miss reporting that we’d provided to shop floor employees. We’d incorrectly assumed that there were few or no safety risks in their work environment, and had not provided them with the training they needed. As a result, we did not equip them to recognize the need for reporting their own slips in the hallways. If we’d equipped them with the awareness to recognize that, we could have likely prevented the recordable injury from happening.

So, what initially looked like an accident that “just happened” turned out to be the result of system level failures in our policies and training procedures. Next, let’s look at some examples of accidents involving machinery and equipment, starting with one of the worst accidents in the history of the American space program.

Historical Example of Using Root Cause Analysis: The Space Shuttle Challenger

On January 28, 1986, a tragedy occurred. The Space Shuttle Challenger (STS-51-L) suddenly exploded 73 seconds after lift-off, killing all seven crew members onboard. Most people old enough to understand what was happening at the time will always remember where they were when they heard the news. I was not quite 14, and I was the middle of class when a teacher came in to tell us the tragic news. More than 34 years later, I remember the look on her face and the sadness in her voice as if it happened yesterday.

It seemed horrible. Inexplicable. The shuttle had lifted off to the cheers of those in attendance, and continued on at first just as other Shuttle missions had, with broadcasters calmly describing the stages of its ascent. Suddenly, the shuttle exploded in a violent fireball, with the center external tank and, likely the shuttle itself, instantly destroyed in the blast. The two solid rocket boosters, now detached from the shuttle and no longer under any human control, twisted like plastic grocery bags in a hurricane. For a number of horrible moments in the broadcast, correspondents struggled to process what they had just seen and what it meant. It seemed to “just happen.”

How could anyone have known that this “major catastrophe in America’s space program,” as the announcer eventually called it, would happen? How could anyone have prevented it?

Fortunately, government officials and NASA would not be satisfied without fully understanding the root causes of this horrific accident. Ronald Reagan, the President of the United States at the time, formed the Rogers Commission to investigate the incident and identify the root causes why it happened. One of the members of this commission was a man who would become a personal hero to me later in my life: physicist Richard Feynman. Feynman was seriously ill with cancer at the time, but he still took his place on the commission to help identify the root causes of the accident.

Feynman’s vast knowledge of theoretical physics soon proved useful, especially as the subsequent investigation focused on the “O-ring,” a rubber gasket designed to form a seal in the shuttle’s solid rocket boosters, preventing the rockets’ hot gas from escaping and damaging other parts of the shuttle. Feynman had already noted that the O-rings seemed to deteriorate rapidly once a small hole was burned in them, which indicated a lack of resiliency. Other committee members, including pioneering astronaut Sally Ride, then informed Feynman that the rings had not been tested at temperatures lower than 50 degrees F, a relevant factor because the temperature on the day of the launch was about 36 degrees F. He began to suspect a connection between these facts.

In a dramatic moment at a committee press conference, Feynman showed a sample of material from an  O-ring that he’s submerged in a glass of ice water, and publicly demonstrated that when exposed to cold temperatures, the material lost its resiliency. With characteristic understatement,  Feyman reported  “I believe that has some significance for our problem.”

The investigation continued, and the problems proved to go even deeper, into NASA’s safety culture at that time. The Commission learned that individuals at both NASA and at the company that manufactured the O-rings had been aware of reliability issues even during the 1970s, but had not escalated their concerns. Additionally, NASA personnel at the time had significantly higher estimates of overall safety for the shuttle than did working engineers, sometimes by as much as a thousandfold. Basically, the safety culture at the time underestimated the risk, and then viewed that unrealistically low estimate as being “acceptable.”

The Rogers Commission’s work provides one of the most historically significant examples of a root cause analysis that, from its outset, assumed there was something important to find and kept going until it identified system-related reasons for the Challenger disaster. The issues identified in the investigation are, in my experience, common to many organizations, although the consequences were far more catastrophic and impactful to the public than your average workplace safety incident. It’s never pleasant to receive those kinds of findings, but bad news is sometimes necessary to identify and control risks that have been lurking in our workplaces all along.

The Role of Safety Inspections and Preventive Maintenance

It’s not always a design flaw that’s responsible for failures of equipment and machinery. Sometimes the equipment comes to us designed perfectly well, but gets damaged over time, whether by improper usage or everyday wear and tear. That’s why inspections and preventive maintenance (PM) are so important, and why so many accident investigations identify problems with inspection and PM processes as root causes.

Suppose someone is operating a forklift and while in the process of moving a load, the forks suddenly fail, dropping the load. Luckily no one is hurt, but obviously there was a high potential for injury and property damage. A person watching might look at this scenario and, struck by how suddenly it occurred, wonder how we could have prevented it. But an experienced safety professional looking at the same incident would hopefully immediately recognize the need to do a full investigation, paying special attention to the inspection and PM schedule for the forklift. Doing so, we may find as one possible example, that the top clip retaining pin for the forks had been defective, which caused the forks to fail under the weight of the load.

It’s very important to verify that forklifts are in working order prior to use, which is why OSHA’s Powered Industrial Trucks Standard in 29 CFR 1910.178 requires that “industrial trucks shall be examined before being placed in service, and shall not be placed in service if the examination shows any condition adversely affecting the safety of the vehicle.” The Standard requires that employers conduct these tests at least daily. OSHA’s Powered Industrial Trucks E-Tool Page highlights key elements to include in the inspection, one of which is the condition of the forks including the top retaining pin and heel.

PM activities include inspections, as well as specific maintenance schedules and safe methods for conducting them for different types of equipment. It’s not only a widely known concept familiar to anyone with a manufacturing or facilities management background (among others), but is also incorporated into many regulations and standards, including OSHA’s Process Safety Management (PSM) rule for facilities storing bulk quantities of flammable and highly hazardous chemicals.

PM is a first line of defense against many types of accidents involving workplace equipment. When accidents occur involving equipment, we should always look carefully at all inspection and PM procedures and documents to identify ways of preventing future accidents.

A Matter of Principle

A common denominator in all of the examples above is that the root causes were not immediately obvious. We had to dig for them, but we first had to assume there was something to dig up. And I think they make it pretty obvious why that needs to be our default assumption, and our standardized approach.

Are there workplace accidents that truly could NOT have been prevented? Certainly. Randomness is part of the fabric of the universe – another lesson I learned in-part from my hero, Richard Feynman. Someone may be holding a part, for example, and suddenly drop it, for no discernible reason except that their muscles didn’t work quite as they should.

But there are two points I’d make. The first is that these kinds of incidents don’t happen as frequently as company leadership would often like to think, and in my experience, they like to think that pretty often. To put it another way, I don’t really see much risk in management and safety professionals going overboard in their search for root causes, because the “accidents will happen” mindset is still pretty much the default approach.

The catch is that we’ll only be able to determine if an event wasn’t the result of some systemic problem in the workplace unless we perform a full investigation, and proceed from the assumption that there is something substantial to find. So, whether an accident is truly “random” or the result of preventable workplace hazards, we still need to do a thorough investigation before we can claim to know what happened with any accuracy or integrity.

My second point is that preventing accidents from happening is only part of the strategy for controlling risks. We must also take actions to mitigate the severity of an outcome when an accident does occur.
While eliminating the hazards that give rise to accidents is the gold standard, it’s not always possible.

For example, construction and maintenance workers sometimes fall through skylights during construction and maintenance activities. We can try to reduce the likelihood of a fall, but probably can’t reduce it all the way to zero. What we can do is put proper guards and fall protection equipment on those skylights so that if workers do fall into them, we can reduced the likelihood of fatalities and serious injuries. This is an example of incorporating risk reduction directly into facility and equipment design, which is a guiding principle of NIOSH’s “Prevention Through Design” national safety initiative.

Accidents Don’t “Just Happen”

Some safety professionals may actually maintain an “accidents just happen” attitude with the best intentions. Let me try to put their reasoning in the best possible light. It goes something like this: It’s unreasonable to expect that we can eliminate all risks and all workplace accidents, because when we don’t achieve “zero incidents,” we demoralize our workers and we may very well be discouraging the reporting of injuries and accidents. Looking for root causes every single time an accident happens, rather than recognizing that some accidents “just happen” drives the harmful focus on “zero incidents” or “zero harm that discourages reporting and deflates morale, and anyone still insisting on thorough root cause analysis is actually contributing to the problem.

I don’t agree with this view, because there is no inherent logical connection between the premises. If a workplace accident happens and I argue that we have to do a full investigation that includes root cause analysis, it doesn’t follow that I expect to completely eliminate all risks and accidents. It just means I’m doing my job.

Taking risk seriously is hard work, but it’s a necessary part of a wider strategy to empower our workers to be safe, rather than simply telling them to be safe and hoping for the best. It’s not possible to build a meaningful safety culture in which every employee feels included and engaged with your policies and programs, while not taking our obligation to those workers seriously enough to actively investigate, identify and control workplace risks.

This is a matter of principle, and as clichéd as it may sound, it’s a matter of character. In safety, you have to care a lot, and you have to understand that your intentions aren’t magic. We need to guide our approach based on sound strategies and proven best practices, rejecting both the status quo of believing “accidents just happen” and the temptation to be “original” at the expense of accuracy and ultimately, our workers’ safety.

If anyone is interested in learning more about Root Cause Analysis, I invite you to join my upcoming webinar:  “Addicted to Blame? How to Fix Your Root Cause Analysis and Your EHS Culture” on July 22 at 11 am ET. Also, please check out my “EHS Unplugged” video on this topic, and while there, watch some of the other videos in the series as well.

Let VelocityEHS Help

Here at VelocityEHS, we take safety seriously and we’ve designed our software platform based on feedback and best practices from EHS professionals like you.

Check out our EHS software solutions, which give you the ability to easily perform and manage important EHS tasks across virtually every area of safety management including audits & inspections, incident management (including root cause analysis), risk management, training management, and corrective actions. Our Risk Management software also provides powerful tools to simplify bowtie analysis, risk registers, hazard studies and other risk assessment activities, with reporting tools to give you the visibility you need.

We also offer award-winning SDS/chemical management tools that streamline compliance with HazCom requirements, including mobile SDS management and employee SDS access, to help manage chemical safety more effectively.

As always, feel free to contact us anytime or Request a Demo today to learn more about how we can help you.