Book chapter
Fault Tolerant Architectures
Handbook of Computer Architecture, pp 277-320
2025
Abstract
Fault-tolerant computing has been the cornerstone of reliable computing using electronic systems. Traditionally, fault-tolerant system design has been primarily driven by the system’s operating environment and the resulting fault scenarios due to external disturbances. However, with the growing unreliability of modern semiconductor technologies, reliable computing requires designing fault tolerance across multiple layers of the system stack. Further, the growing impact of intrinsic fault mechanisms such as aging requires designing fault-tolerant architectures across application domains. To this end, emerging technologies, both semiconductor devices and applications, have brought about novel challenges to designing fault-tolerant architectures.
This chapter provides a brief overview of the landscape of fault-tolerant architectures, from the fundamentals to the state-of-the-art and open research areas. The chapter begins with the background of faults, errors, and reliability estimations. Fault-tolerant architecture for computation, memory/storage, and communication are briefly covered. Related state-of-the-art topics such as cross-layer reliability and fault-tolerance for emerging devices (NVMs) and emerging applications (AI/ML) are also covered in the chapter.
Metrics
7 Record Views
Details
- Title
- Fault Tolerant Architectures
- Creators
- Siva Satyendra SahooAnup DasAkash Kumar
- Contributors
- Anupam Chattopadhyay (Editor)
- Publication Details
- Handbook of Computer Architecture, pp 277-320
- Publisher
- Springer Nature Singapore; Singapore
- Resource Type
- Book chapter
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Scopus ID
- 2-s2.0-105004248157
- Other Identifier
- 991022005651904721