Chapter 10: Safety – Designing Safe & Reliable Architectures

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

Software safety principles require architects to design mechanisms capable of detecting and recovering from unsafe states, which typically manifest as specific failures such as timing errors (late or early occurrence), sequence failures, or incorrect data (omissions or spurious commissions). Designing for safety begins with mandatory processes like Failure Mode and Effects Analysis (FMEA) or Fault Tree Analysis (FTA) to systematically identify safety-critical functions and potential hazards. The generalized safety scenario defines the necessary architectural responses, which include placing the system into a safe mode, shutting down (fail safe), or transitioning to manual operation. To achieve these goals, architects employ three broad categories of safety tactics: Unsafe State Avoidance (e.g., using Substitution via reliable hardware mechanisms or Predictive Models for early warning), Unsafe State Detection (e.g., Timeout for timing constraints, Condition Monitoring, and Sanity Checking for input validity), and Unsafe State Containment/Remediation. Containment utilizes Redundancy (like Functional Redundancy to counter common-mode failures, or Analytic Redundancy to tolerate specification errors), Limit Consequences (such as the Abort tactic or Degradation to maintain critical functions during failure), and Barrier tactics (Firewalls and Interlocks to control sequencing). Finally, Recovery tactics like Rollback to a known good state or Reconfiguration of resources ensure continued operation. Practical safety patterns, such as the Monitor-Actuator (separating value calculation from reasonableness checking) and Separated Safety (partitioning the system to reduce costly certification efforts), further enhance system security. These architectural efforts are often codified via industry standards, including Design Assurance Levels (DALs) used in avionics or Safety Integrity Levels (SILs), which categorize failure criticality and guide resource allocation for verification and testing.