│Introduction│FTA History│Applicability│Symbols│Construction Steps│Comments│Back│
The fault tree analysis (FTA)
Introduction
To design systems that work correctly we often need to understand and correct how they can go wrong. Fault trees provide a good framework for both qualitative and quantitative analysis because they have both a logical (Boolean algebra) and probabilistic basis. The fault tree analysis (FTA) is a professional-level hazard ID tool based on the negative type logic diagram. The FTA adds several dimensions to the basic logic tree. The most important of these additions are the use of symbols to add information to the trees and the possibility of adding quantitative risk data to the diagrams. With these additions, the FTA adds substantial hazard ID value to the basic logic diagram previously discussed.
This is a graphical technique that provides a systematic description of the combinations of possible occurrences in a system, which can result in an undesirable outcome. This method can combine hardware failures and human failures. The most serious outcome such as explosion, toxic release, etc. is selected as the Top Event. A fault tree is then constructed by relating the sequences of events, which individually or in combination, could lead to the Top Event. This may be illustrated by considering the probability of a crash at a road junction and constructing a tree with AND and OR logic gates. The tree is constructed by deducing in turn the preconditions for the top event and then successively for the next levels of events, until the basic causes are identified.
FTA History
The Beginning Years (1961 – 1970)
- H. Watson of Bell Labs, along with A. Mearns, developed the technique for the Air Force for evaluation of the Minuteman Launch Control System, circa 1961
- Recognized by Dave Haasl of Boeing as a significant system safety analysis tool (1963)
- First major use when applied by Boeing on the entire Minuteman system for safety evaluation (1964 – 1967, 1968-1999)
- The first technical papers on FTA were presented at the first System Safety Conference, held in Seattle, June 1965
- Boeing began using FTA on the design and evaluation of commercial aircraft, circa 1966
- Boeing developed a 12-phase fault tree simulation program, and a fault tree plotting program on a Calcomp roll plotter
The Early Years (1971 – 1980)
- Adopted by the Aerospace industry (aircraft and weapons)
- Adopted by the Nuclear Power industry
- Power industry enhanced codes and algorithms
- Some of the more recognized software codes include: Prepp/Kitt, SETS, FTAP, Importance and COMCAN
The Mid Years (1981 – 1990)
- Usage started becoming international, primarily via the Nuclear Power industry
- More evaluation algorithms and codes were developed
- A large number of technical papers were written on the subject (codes & algorithms)
- Usage of FTA in the software (safety) community
- Adopted by the Chemical industry
The Present (1991 – 1999)
- Continued use on many systems in many countries
- High quality fault tree Commercial codes developed that operates on PC’s
- Adopted by the Robotics and Software industry
Applicability
Because of its relative complexity and detail, it is normally not cost effective to use the FTA against risks assessed below the level of extremely high or high. The method is used extensively in the acquisition of new weapons systems and other complex systems where, due to the complexity and criticality of the system, the tool is a must. A fault tree model provides a logical framework for analyzing the failure behavior of a system. A fault tree model precisely documents which failure scenarios have been considered and which have not. Fault tree analysis can be used to support engineering and management decisions, trade-off analysis and risk assessment. The fault tree model has a well-defined boolean algebraic and probabilistic basis which relates probability calculations to Boolean logic functions. FTA application includes:
Root Cause Analysis
- Identify all relevant events and conditions leading to Undesired Event
- Determine parallel and sequential event combinations
- Model diverse/complex event interrelationships involved
Risk Assessment
- Calculate the probability of an Undesired Event (level of risk)
- Identify safety critical components/functions/phases
- Measure effect of design changes
Design Safety Assessment
- Demonstrate compliance with requirements
- Shows where safety requirements are needed
- Identify and evaluate potential design defects/weak links
- Determine Common Mode failures
Fault Tree Analysis Symbols
Basic Events
- Basic Event: corresponds to a basic failure Characterized by failure rate or failure probability event (usually a component failure) in the system.
- Undeveloped Basic Event: A basic event that is not completely developed, usually because of unavailable information. Characterized by failure rate or failure probability
- Replicated Basic Event; represents k statistically identical copies of a component
- Characterized by failure rate or failure probability
Static fault tree gates
- AND gate - output event occurs only if ALL input events occur. AND gates can be protected by disallowing one of the inputs.
- exhaustive testing or formal proof to show module cannot fail
- test for failure condition and provide recovery routine
- OR gate - output event occurs if one or more input events occur. OR gate can be protected by disallowing all inputs or by providing detection and recovery point. (The detection and recovery routines must be simple enough to be certifiably correct.)
- m/n gate - output event occurs if m or more of the n inputs occur
- Transfer symbols - These symbols transfer the user to in another part of the diagram. These symbols are used to eliminate the need to repeat identical analyses that have been completed in connection with another part of the fault tree.
Sequence dependency gates
Several special purpose gates have been added to the traditional fault tree gates. These special dynamic gates capture sequence dependencies which frequently arise when modeling fault tolerant computer systems. If a dynamic gate is part of a fault tree then it is solved via a Markov chain, rather than by using traditional methods. The special dynamic gates include:
- Functional dependency gate for modeling situations where one component’s correct operation is dependent upon the correct operation of some other component
- Spare gate for modeling cold, warm and hot pooled spares
- Priority-AND gate for modeling ordered ANDing of events. Note that many traditional fault trees include the Priority AND gate; most simply approximate with an AND gate
Cut Set Terms
- Cut Set: A set of events that together cause the tree Top UE event to occur
- Min CS (MCS): A CS with the minimum number of events that can still cause the top event
- Super Set: A CS that contains a MCS plus additional events to cause the top UE
- Critical Path: The highest probability CS that drives the top UE probability
- Cut Set Order: The number of elements in a cut set
- Cut Set Truncation: Removing cut sets from consideration during the FT evaluation process CS’s are truncated when they exceed a specified order and/or probability
Node Construction Steps
Construction at each gate node involves a 3 step process:
Step 1 - Immediate, Necessary and Sufficient (INS)
- Read the IG event wording
- Identify all Immediate, Necessary and Sufficient events to cause the IG event
- Structure the INS casual events with appropriate logic:
- Immediate – do not skip past events
- Necessary – include only what is actually necessary
- Sufficient – do not include more than the minimum necessary
- Mentally test the events and logic until satisfied
Step 2 - Primary, Secondary and Command (PSC)
- Read the IG event wording (Step 1)
- Word Gate events in terms of Input or Output
- Consider the type of fault path for each Enabling Event
- identify each causing event as one of the following path types: Primary Fault, Secondary Fault, Command Fault (Induced Fault, Sequential Fault)
- structure the sub events and gate logic from the path type
- any event that is not a BE (component) event is another Enabling Event (Command Path)
Step 3 - State of the System or Component
- Read the IG event wording
- Ask “ is the IG a State of the System or State of the Component event”
- State of the Component is identified by being at the component level
- State of the System is identified by being composed of more IG events
- If its not State of the Component then it must be a State of the System
Comments
The FTA is one of the few hazard ID procedures that will support quantification when the necessary data resources are available. Traditional fault trees cannot model sequence dependent failures, in which the order that events occur is important. We define special purpose gates for modeling sequence dependencies, and solve the resulting fault tree as a Markov chain. The development of a correct Markov model for a complex system can be difficult. Our approach is to use the fault tree for model development and automatically convert the fault tree to the equivalent Markov chain. The dynamic fault tree model is considerably simpler than the equivalent Markov chain. Coverage models are automatically added to the resulting Markov chain which is solved via a numerical differential equation solver.
The DFT (dynamic fault tree) methodology is ideally suited for the analysis of computer-based systems. DFT uses a modular approach to FTA, detecting modules using a fast and efficient algorithm. Modules are classified as static or dynamic, depending on the types of gates included.
Static modules are solved using the BDD approach; dynamic modules are solved using Markov chain methods. Coverage models can assess the effect of complex recovery mechanisms. Dynamic gates can allow modeling of sequence dependencies that arise from complex redundancy management.