Incident investigation Practical recommendations

S. Palumbo, A. Tomás

1. Accident vs Incident

Incident Investigation is a powerful method of identifying the causes as well as the sequence of events that lead to an unsafe condition. If the condition of insecurity has developed in a scenario with an impact on the people, the environment, the equipment or the company’s reputation, it is called an “accident”. If this scenario has only generated the potential to cause harm, it is called an “incident”.

Consequences greater than those manifested may be hidden in an “incident” or “near-miss”, which is why it is preferred to refer to it as incidents’ investigation and not accidents.

Fig.1: ” Iceberg Theory”, Bird y Germain, 1985.

It is acknowledged that behind each fatality there are ten (10) severe incidents, thirty (30) minor and six hundred (600) “almost accidents”. At the base of the pyramid are the so-called unsafe acts or unsafe conditions that are, in fact, the origin of incidental scenarios, often in the form of “hidden causes”.

The best way to avoid more severe events is preventing and mitigating these minor cases.

2. Previous organization for the incident’s investigation

An efficient incident investigation occurs when the evidence, testimonies, and data collected faithfully describe what happened. To reduce the risk of losing information, it is necessary to start this process as early as possible.

In a real accident situation, immediate action is taken to restore personnel and installation’s safety. In this phase there are many professional figures involved (emergency management responsible, rescue team, process supervisors, first aid personnel, etc.) and incident investigation, usually, is not triggered immediately.

Fig.2: “Explosion and fire shut Irving Oil refinery in Canada” by tonynetone is licensed under CC BY 2.0

The only way to achieve both objectives (immediate response to re-establish safe conditions, and the accident investigation) is to have previously defined structure and procedures.

The recipe for success involves pre-defining roles, availability of resources, and detailing the steps to be followed by everyone involved in the investigation.

The response measures taken in an accident are fundamental points in a Process Safety Management System (PSMS), which establishes the guidelines to guarantee the identification of industrial risks, risks’ management and continuous improvement through learnings based on the experience.

3. When to conduct an incident investigation

If an incident occurs, the first step is to evaluate whether an investigation will be necessary.

Historically, cases with material damage (to people or facilities) were investigated basing the evaluation only on the episode’s severity. Currently the criteria that justify an investigation is broader and include:

  • Near-misses
  • Potential risk situations
  • Legal requirements
  • Similar incidents in other facilities

It is recommended that each company establishes in advance the incidents’ categories. There are levels of incident events, identified with term “Tier” (i.e. “Tier 1”, “Tier 2”, etc.), ordered numerically by severity, and which describe the event’s relevance  (TIER 1 = A loss of containment of a mass of toxic substances greater than xx grams).

The concept of TIER was first introduced in 2011 in IOGP Report 456, “Process Safety – Recommended Practice of KPI”, based on previous studies by CCPS (Center for Chemical and Process Safety) and API (American Petroleum Institute).

It is important to highlight that, depending on the incident’s category, the detail of the investigation method will vary. The resources involved should match the scenarios’ complexity and the final objectives pursued.

To ensure the investigation’s success, it is important to have in mind that the sooner the potential hazards are identified, the less exposed to risks the company will probably be. The “achievement” of a shorter analysis time cannot be subordinated to the results quality.

Fig 3: Incident report example.

4. Questions to answer in an investigation

There are elements indispensable in incidents’ investigation that allow us to answer the five fundamental questions (“5W”):

  1. When?
  2. Where?
  3. Who?
  4. What?
  5. Why?

These questions should be used as a guideline for the investigation, and the answers should be given once all the evidence has been collected, so as not to deviate the analysis leading to incorrect or partial results.

The answers to questions 1 to 4 define the “Events’ sequence” of the incident, while the answer to question 5 is the investigation itself. For this one, different investigation methods are used, such as the “Root Cause Analysis” (RCA), as it is one of the most complete.

To simplify the execution of both, the “Events’ sequence” and the “Root Cause Analysis”, there are digital solutions that allow the database construction, the logical organization of the investigation’s elements and the result’s optimized visualization, such as Incident XP, from CGE Risk Management Solutions.

5. Required resources

According to PSMS guidelines, the company must instruct and train its employees as well as keep them informed about incidents investigations’ results.

According to the PSMS, it is the Process Safety Committee, an independent team that must include not only management but also a technical team, the responsible for authorizing the incident’s investigation and appointing a leader for the investigation, and in addition , the recommendations derived from it. This leader will choose the members that will be part of the investigation team.

File:Professor Holding a Magnifying Glass Cartoon.svg” by VideoPlasty is licensed under CC BY-SA 4.0

The investigation team members can be:

  • Operators and Operation managers
  • Security officers
  • External experts
  • Local authorities (when necessary by legal requirements).
  • Personnel from areas affected by the event (i.e. electrical, controls, chemical, instrumentation specialists, etc.)
  • Investigation leader (preferably an expert in the selected investigation technique)

The incident investigation team aims to collect evidence and testimony to identify the event’s causes.

Examples of evidence are:

  • Photographs
  • Documents
  • Operating procedures
  • Operation diaries
  • Computer data from the monitoring and control system

6. Errors in data collection

To preserve the incident’s evidence, it is necessary to confine the area as soon as possible and restrict the access.

The preservation of the area is, in an accident with damage to personnel or facilities, a legal requirement. A modification in the incident area may lead to wrong conclusions or the loss of fundamental evidence.

Another source of errors comes from the testimonies of the staff present at the incident; in case they are not properly interviewed. In this regard, it is important to keep in mind that they are one of the most vulnerable sources, not only in terms of the investigation, but also over time. There is a risk that the staff influence each other or that they feel intimidated. Therefore, it is necessary to explain carefully that the objective of the questions is to understand the incidents’ causes, to avoid its occurrence in the future and not to seek those responsible who may be subject to possible sanctions. Interviews must be conducted by staff who empathize with the interviewees and who take into account their emotional state. It can be very helpful to repeat the sequence of events in the incident area.

7. Data analysis

Once the preliminary research data is available, it is necessary to organize it for analysis. It is not uncommon that, once the data is analyzed, some events are identified that need specific subsequent investigation to complete the information table.

The first step to follow is to identify the events ‘sequence, building a “timeline” where the evidence found is put in chronological order. It is recommended to use both classic methods, such as the use of Post-It as well as the most modern digital systems, to be able to insert evidence collected at each point on the line.

Methodologies for defining the events’ sequence are:

  • STEP – Sequential Timed Events Plotting
  • ECFC – Events and Causal Factors Charting
  • MTO Analysis – Man-Technology-Organization Analysis

Fig 4: Example of the STEP methodology’s application.

 

After the events’ sequence, the causes are evaluated.

The most widely used methods in the industry for the cause analysis are:

  • RCA – Root Cause Analysis:
    • 5-Whys analysis
    • Bowtie analysis
    • Fishbone diagrams (Ishikawa diagrams)
  • Fault Tree Analysis
  • Event Tree Analysis
  • SCAT – Systemic Causal Analysis Technique
  • Barrier Failure Analysis
  • Tripod Beta by Shell
  • Change Analysis
  • AEB – Accident Analysis and Barrier Function
  • MORT – Management Oversight and Risk Tree
  • ECFA – Events and Causal Factors Analysis
  • Acci-Map

Since the investigation is an iterative process, where the data is frequently reviewed, details are added and the structure is reorganized, it is recommended to use commercial software, which simplify the work and facilitate tree of causes’ understanding.

Fig 5: Example of Failure Barrier Analysis (Incident XP, GCE).

Fig 6: Example of Tripod Beta (Incident XP, GCE)

8. Errors in data analysis

Data analysis can be successful only if the selected method is in harmony to the situation’s complexity and the availability of the evidence collected. There is a risk that an unnecessary complex method could lead to losing sight of the central point of the analysis. Similarly, an exceedingly simple method may not consider key factors in the events ‘causes.

 Analysis results can lead to three types of identified causes:

  1. Objective evidence
  2. Speculations
  3. Assumptions

The difference between them is that in the first case there is direct evidence of what was sustained, in the second there are indications that lead us to consider probable that a certain event has occurred, while in the last case there is no confirmation of the facts and hypotheses are established. Obviously, the more objective evidence provided to the study, the closer we will be to reality. In all cases, it is a good practice to specify, for each cause, which category it belongs to.

9. Investigation report

The Incident Investigation Report aims to present what happened, both the sequence of events and the causes, and it must define the barriers to be implemented to avoid the main event and the measures to be taken to ensure its effectiveness.

The information that must be included in an Incident Investigation Report is:

  • Incident title
  • Incident identification code
  • Location
  • Incident Date / Time:
  • Date / Time the incident is discovered
  • Investigation start date
  • Investigation team
  • Incident type (Health / Environment / Property)
  • Description of the incident
  • Timeline of events
  • Collected information and testimonials
  • Root cause analysis
  • Preventive and corrective actions

The recommendations issued in the report must meet the “SMART” criteria:

  • Specific – (it refers to the reason for which it has been proposed)
  • Measurable – (it can be measured and evaluated for effectiveness)
  • Achievable – (it can be done with the available resources)
  • Relevant – (it has a significant priority in relation to other options)
  • Time bound – (it can be done in a defined time)

10. Report’s mistakes

The Incident Investigation report is a formal document directed at a multidisciplinary group, not always composed of technical profiles.

It is a common mistake to report excessively complex incidents. It is essential that the clarity of the exhibition is guaranteed with a clear wording, as well as providing, as far as possible, diagrams and graphic representations.

Another mistake that is usually presented in the report is to make recommendations that have not been endorsed by the actors involved, thus not respecting the “SMART” criteria indicated above.

Alternatively, it is not uncommon to find actions that are too generic, do not have deadlines or where it is not clear who is responsible for their implementation.

All these gray areas generate conflicts, mistrust and, as a final effect, delay in the implementation of measures to protect the facility and prevent the occurrence of the same incident

11. Conclusions

Incident investigation is the analysis’ method that allows us to know the causes that give rise to an unwanted event. To guarantee the success of this task, prior planning of the investigation process is essential, as well as the assignment of roles to those responsible for its development.

It is equally important to carry out the investigation immediately and to select personnel for the interviews that empathize with the witnesses and that take into account their emotional states, in order to ensure the reliable collection of information.

During the investigation, information may arise that alters the postulated sequence of events. Therefore, to manage all the information derived from this process, it is recommended to use specific software developed for this purpose, which incorporates the causes, the existing barriers and the sequence of failures.

A relevant point that should not be neglected when an investigation is finished, is the communication of the results to the facility’s personnel, and at all levels. Knowledge of the causes, and particularly the “hidden” causes, generates a positive circle of scenario’s prevention, that transcends the facilities’ safety.