Principles of High Availability Embedded Systems Design
High availability systems are those able to tolerate both expected and unexpected faults. Their design is based on a combination of redundant hardware components and software to manage fault detection and correction. This must be done without human intervention to achieve “five-nines” (99.999%) or greater availability, equivalent to less than 1 second of downtime per day. After a quick presentation of definitions relevant to high availability and fault management, basic hardware N-plexing and voting issues are discussed. This is followed by an in-depth discussion of software fault tolerance techniques appropriate for embedded systems, starting with the static method of N-version programming. A number of dynamic software fault tolerance techniques are then surveyed, including Checkpoint-Rollback, Process Pairs and Recovery Blocks. The discussion ends with a forward error recovery technique called Alternative Processing. Many real-world examples are presented.
Please disable any pop-up blockers for proper viewing of this Whitepaper.