Safety engineering is used to assure that a life-critical system behaves as needed even when pieces fail.

Fault modeling techniques

The two most common fault modeling techniques are called "failure modes and effects analysis" and "fault tree analysis." UML activity diagrams can be used as graphical components in a fault tree analysis. These techniques are just ways of finding problems and of making plans to cope with failures.

Failure modes and effects analysis

In the technique known as "failure modes and effects analysis", an engineer starts with a block diagram of a system. The engineer then considers what happens if each block of the diagram fails. The engineer than draws up a table in which failures are paired with their effects and an evaluation of the effects. The design of the system is then corrected, and the table adjusted until the system is not known to have unacceptable problems. Of course, the engineers may make mistakes. It's very helpful to have several engineers review the failure modes and effects analysis.

Fault tree analysis

In the technique known as "fault tree analysis", an undesired effect is taken as the root of a tree of logic. Then, each situation that could cause that effect is added to the tree as a series of logic expressions. When fault trees have real numbers about failure probabilities (often unavailable because of testing expense), computer programs can calculate failure probabilities from fault trees. The classic computer program is the Idaho National Engineering and Environmental Laboratory's SAPHIRE, which is used by the U.S. government to evaluate the safety and reliability of nuclear reactors, the space shuttle, and the International Space Station.

Safety certification

Usually a failure in safety-certified systems is acceptable if less than one life per 30 years of operation (109 seconds) is lost to mechanical failure. Most Western nuclear reactors, medical equipment, and commercial aircraft are certified to this level.

Preventing failure

Adding equipment and systems

Once a failure mode is identified, it can usually be corrected by adding equipment to the system. For example, nuclear reactors emit dangerous radiation and contain nasty poisons, and nuclear reactions can cause such high heat that no substance can contain them. Therefore reactors have emergency core cooling systems to keep the heat down, shielding to contain the radiation, and containments (usually several, nested) to prevent leakage.

Redundancy

For any given failure, a fail-over, or redundancy can almost always be designed and incorporated into a system.

Fail-safe design

When adding equipment is impractical (usually because of expense), then the design has to made inherently safe, or "fail safe". The typical approach is to arrange the system so that ordinary single failures cause the mechanism to shut down in a safe way. For example, in an elevator the cable supporting the car pulls spring-loaded brakes open. If the cable breaks, the brakes grab rails, and the car does not fall. Another common fail-safe system is the pilot-light sensor in most gas furnaces. If the pilot light is cold, a mechanical arrangement disengages the gas valve, so that the house cannot fill with unburned gas. Fail safes are common in medical equipment, traffic and railway signals, communications equipment, and safety equipment.

The safety engineer

Personality and role

Oddly enough, personality issues can be paramount in a safety engineer. They must be personally pleasant, intelligent, and ruthless with themselves and their organization. In particular, they have to be able to "sell" the failures that they discover, as well as the attendant expense and time needed to correct them.

Safety engineers have to be ruthless about getting facts from other engineers. It is common for a safety engineer to consider software, chemical, electronic, eletrical, mechanical, procedural, and training problems in the same day. Often the facts can be very uncomfortable.

Teamwork

It is important to make the safety engineers part of a team, so that safety problems cannot be discounted as due to the safety engineers' personality problems or ignored by firing a single engineer.

It is a severe safety problem if an engineering team or management discredits a safety engineer: either the manager apppointed a poor engineer to the position, indicating that there may be numerous undiscovered safety issues, or the team has inverted development priorities and considers safety to be less important than upper management or government does.

See also: