Conference proceeding
Performance of fault-tolerant distributed shared memory on broadcast- and switch-based architectures
19th IEEE International Parallel and Distributed Processing Symposium, v 2005, p7 pp
2005
Abstract
This paper presents a set of distributed-shared-memory protocols that provide fault tolerance on broadcast-based and switch-based architectures with no decrease in performance. These augmented DSM protocols combine the data duplication required by fault tolerance with the data duplication that naturally results in distributed-shared-memory implementations. The recovery memory at each backup node is continuously maintained consistent and is accessible by all processes executing at the backup node. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Data blocks which are duplicated to maintain the recovery memory are also utilized by the DSM protocol, reducing network traffic, and increasing the processor utilization significantly. We use simulation and multiprocessor address trace files to compare the performance of a broadcast architecture called the SOME-Bus to the performance of two representative switch architectures.
Metrics
4 Record Views
Details
- Title
- Performance of fault-tolerant distributed shared memory on broadcast- and switch-based architectures
- Creators
- C Katsinis - Drexel University
- Publication Details
- 19th IEEE International Parallel and Distributed Processing Symposium, v 2005, p7 pp
- Publisher
- IEEE
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Computer Science
- Scopus ID
- 2-s2.0-33746272934
- Other Identifier
- 991019174148704721