Conference proceeding
Fault-tolerance using cache-coherent distributed shared memory systems
Proceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99)
1999
Abstract
Describes new protocols augmenting traditional cache coherency mechanisms to implement fault tolerance based on recovery blocks and checkpointing. Concurrent processes compound rollback recovery since the rollback can potentially lead to a "domino effect" whereby the process is rolled back to the beginning. Several approaches have been proposed to limit the domino effect. One set of such techniques requires communicating processes to periodically synchronize in order to checkpoint a globally consistent state. These schemes can be implemented more naturally on distributed shared memory systems using synchronization on shared memory. We have developed extensions to well-known cache-coherency methods (e.g. directory-based) for the implementation of checkpointing consistent states.
Metrics
8 Record Views
Details
- Title
- Fault-tolerance using cache-coherent distributed shared memory systems
- Creators
- D.L Hecht - University of Alabama in HuntsvilleK.M KaviR.K GaedeC Katsinis
- Publication Details
- Proceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99)
- Conference
- 4th International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99), 4th
- Publisher
- IEEE
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Computer Science
- Web of Science ID
- WOS:000081657300016
- Other Identifier
- 991019182771704721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Hardware & Architecture
- Computer Science, Information Systems
- Computer Science, Theory & Methods