Logo image
Fault-tolerance using cache-coherent distributed shared memory systems
Conference proceeding

Fault-tolerance using cache-coherent distributed shared memory systems

D.L Hecht, K.M Kavi, R.K Gaede and C Katsinis
Proceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99)
1999

Abstract

Decision support systems Fault tolerant systems
Describes new protocols augmenting traditional cache coherency mechanisms to implement fault tolerance based on recovery blocks and checkpointing. Concurrent processes compound rollback recovery since the rollback can potentially lead to a "domino effect" whereby the process is rolled back to the beginning. Several approaches have been proposed to limit the domino effect. One set of such techniques requires communicating processes to periodically synchronize in order to checkpoint a globally consistent state. These schemes can be implemented more naturally on distributed shared memory systems using synchronization on shared memory. We have developed extensions to well-known cache-coherency methods (e.g. directory-based) for the implementation of checkpointing consistent states.

Metrics

8 Record Views

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Computer Science, Hardware & Architecture
Computer Science, Information Systems
Computer Science, Theory & Methods
Logo image