ICMA

│Introduction│FTA History│Applicability│Symbols│Construction Steps│Comments│Back│

The fault tree analysis (FTA)

Introduction

To design systems that work correctly we often need to understand and correct how they can go wrong. Fault trees provide a good framework for both qualitative and quantitative analysis because they have both a logical (Boolean algebra) and probabilistic basis. The fault tree analysis (FTA) is a professional-level hazard ID tool based on the negative type logic diagram. The FTA adds several dimensions to the basic logic tree. The most important of these additions are the use of symbols to add information to the trees and the possibility of adding quantitative risk data to the diagrams. With these additions, the FTA adds substantial hazard ID value to the basic logic diagram previously discussed.

This is a graphical technique that provides a systematic description of the combinations of possible occurrences in a system, which can result in an undesirable outcome. This method can combine hardware failures and human failures. The most serious outcome such as explosion, toxic release, etc. is selected as the Top Event. A fault tree is then constructed by relating the sequences of events, which individually or in combination, could lead to the Top Event. This may be illustrated by considering the probability of a crash at a road junction and constructing a tree with AND and OR logic gates. The tree is constructed by deducing in turn the preconditions for the top event and then successively for the next levels of events, until the basic causes are identified.

FTA History

The Beginning Years (1961 – 1970)

H. Watson of Bell Labs, along with A. Mearns, developed the technique for the Air Force for evaluation of the Minuteman Launch Control System, circa 1961
Recognized by Dave Haasl of Boeing as a significant system safety analysis tool (1963)
First major use when applied by Boeing on the entire Minuteman system for safety evaluation (1964 – 1967, 1968-1999)
The first technical papers on FTA were presented at the first System Safety Conference, held in Seattle, June 1965
Boeing began using FTA on the design and evaluation of commercial aircraft, circa 1966
Boeing developed a 12-phase fault tree simulation program, and a fault tree plotting program on a Calcomp roll plotter

The Early Years (1971 – 1980)

Adopted by the Aerospace industry (aircraft and weapons)
Adopted by the Nuclear Power industry
Power industry enhanced codes and algorithms
Some of the more recognized software codes include: Prepp/Kitt, SETS, FTAP, Importance and COMCAN

The Mid Years (1981 – 1990)

Usage started becoming international, primarily via the Nuclear Power industry
More evaluation algorithms and codes were developed
A large number of technical papers were written on the subject (codes & algorithms)
Usage of FTA in the software (safety) community
Adopted by the Chemical industry

The Present (1991 – 1999)

Continued use on many systems in many countries
High quality fault tree Commercial codes developed that operates on PC’s
Adopted by the Robotics and Software industry

Applicability

Because of its relative complexity and detail, it is normally not cost effective to use the FTA against risks assessed below the level of extremely high or high. The method is used extensively in the acquisition of new weapons systems and other complex systems where, due to the complexity and criticality of the system, the tool is a must. A fault tree model provides a logical framework for analyzing the failure behavior of a system. A fault tree model precisely documents which failure scenarios have been considered and which have not. Fault tree analysis can be used to support engineering and management decisions, trade-off analysis and risk assessment. The fault tree model has a well-defined boolean algebraic and probabilistic basis which relates probability calculations to Boolean logic functions. FTA application includes:

Root Cause Analysis

Identify all relevant events and conditions leading to Undesired Event
Determine parallel and sequential event combinations
Model diverse/complex event interrelationships involved

Risk Assessment

Calculate the probability of an Undesired Event (level of risk)
Identify safety critical components/functions/phases
Measure effect of design changes

Design Safety Assessment

Demonstrate compliance with requirements
Shows where safety requirements are needed
Identify and evaluate potential design defects/weak links
Determine Common Mode failures

Fault Tree Analysis Symbols

Basic Events

Basic Event: corresponds to a basic failure Characterized by failure rate or failure probability event (usually a component failure) in the system.
Undeveloped Basic Event: A basic event that is not completely developed, usually because of unavailable information. Characterized by failure rate or failure probability
Replicated Basic Event; represents k statistically identical copies of a component
Characterized by failure rate or failure probability

Static fault tree gates

AND gate - output event occurs only if ALL input events occur. AND gates can be protected by disallowing one of the inputs.
exhaustive testing or formal proof to show module cannot fail
test for failure condition and provide recovery routine
OR gate - output event occurs if one or more input events occur. OR gate can be protected by disallowing all inputs or by providing detection and recovery point. (The detection and recovery routines must be simple enough to be certifiably correct.)
m/n gate - output event occurs if m or more of the n inputs occur
Transfer symbols - These symbols transfer the user to in another part of the diagram. These symbols are used to eliminate the need to repeat identical analyses that have been completed in connection with another part of the fault tree.

Sequence dependency gates

Several special purpose gates have been added to the traditional fault tree gates. These special dynamic gates capture sequence dependencies which frequently arise when modeling fault tolerant computer systems. If a dynamic gate is part of a fault tree then it is solved via a Markov chain, rather than by using traditional methods. The special dynamic gates include:

Functional dependency gate for modeling situations where one component’s correct operation is dependent upon the correct operation of some other component
Spare gate for modeling cold, warm and hot pooled spares
Priority-AND gate for modeling ordered ANDing of events. Note that many traditional fault trees include the Priority AND gate; most simply approximate with an AND gate

Cut Set Terms

Cut Set: A set of events that together cause the tree Top UE event to occur
Min CS (MCS): A CS with the minimum number of events that can still cause the top event
Super Set: A CS that contains a MCS plus additional events to cause the top UE
Critical Path: The highest probability CS that drives the top UE probability
Cut Set Order: The number of elements in a cut set
Cut Set Truncation: Removing cut sets from consideration during the FT evaluation process CS’s are truncated when they exceed a specified order and/or probability

Node Construction Steps

Construction at each gate node involves a 3 step process:

Step 1 - Immediate, Necessary and Sufficient (INS)

Read the IG event wording
Identify all Immediate, Necessary and Sufficient events to cause the IG event
Structure the INS casual events with appropriate logic:

Immediate – do not skip past events
Necessary – include only what is actually necessary
Sufficient – do not include more than the minimum necessary

Mentally test the events and logic until satisfied

Step 2 - Primary, Secondary and Command (PSC)

Read the IG event wording (Step 1)
Word Gate events in terms of Input or Output
Consider the type of fault path for each Enabling Event

identify each causing event as one of the following path types: Primary Fault, Secondary Fault, Command Fault (Induced Fault, Sequential Fault)
structure the sub events and gate logic from the path type
any event that is not a BE (component) event is another Enabling Event (Command Path)

Step 3 - State of the System or Component

Read the IG event wording
Ask “ is the IG a State of the System or State of the Component event”

State of the Component is identified by being at the component level
State of the System is identified by being composed of more IG events
If its not State of the Component then it must be a State of the System

Comments

The FTA is one of the few hazard ID procedures that will support quantification when the necessary data resources are available. Traditional fault trees cannot model sequence dependent failures, in which the order that events occur is important. We define special purpose gates for modeling sequence dependencies, and solve the resulting fault tree as a Markov chain. The development of a correct Markov model for a complex system can be difficult. Our approach is to use the fault tree for model development and automatically convert the fault tree to the equivalent Markov chain. The dynamic fault tree model is considerably simpler than the equivalent Markov chain. Coverage models are automatically added to the resulting Markov chain which is solved via a numerical differential equation solver.

The DFT (dynamic fault tree) methodology is ideally suited for the analysis of computer-based systems. DFT uses a modular approach to FTA, detecting modules using a fast and efficient algorithm. Modules are classified as static or dynamic, depending on the types of gates included.

Static modules are solved using the BDD approach; dynamic modules are solved using Markov chain methods. Coverage models can assess the effect of complex recovery mechanisms. Dynamic gates can allow modeling of sequence dependencies that arise from complex redundancy management.