Journal article
Fault-tolerant distributed shared memory on a broadcast-based architecture
IEEE transactions on parallel and distributed systems, v 15(12), pp 1082-1092
Dec 2004
Abstract
Due to advances in fiber-optics and VLSI technology, interconnection networks that allow multiple simultaneous broadcasts are becoming feasible. Distributed-shared-memory implementations on such networks promise high performance even for applications with small granularity. This paper presents the architecture of one such implementation, called the simultaneous optical multiprocessor exchange bus, and examines the performance of augmented DSM protocols that exploit the natural duplication of data to maintain a recovery memory in each processing node and provide basic fault tolerance. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Under certain conditions, data blocks that are duplicated to maintain the recovery memory are utilized by the underlying DSM protocol, reducing network traffic, and increasing the processor utilization significantly.
Metrics
Details
- Title
- Fault-tolerant distributed shared memory on a broadcast-based architecture
- Creators
- C Katsinis - Dept. of Electr. & Comput. Eng., Drexel Univ., Philadelphia, PA, USAD Hecht
- Publication Details
- IEEE transactions on parallel and distributed systems, v 15(12), pp 1082-1092
- Publisher
- IEEE
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Computer Science
- Web of Science ID
- WOS:000224624100003
- Scopus ID
- 2-s2.0-11044233770
- Other Identifier
- 991014877819104721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Theory & Methods
- Engineering, Electrical & Electronic