Journal article
SynchroTrace: Synchronization-Aware Architecture-Agnostic Traces for Lightweight Multicore Simulation of CMP and HPC Workloads
ACM transactions on architecture and code optimization, v 15(1)
01 Apr 2018
Abstract
Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over execution-driven simulation, such as reducing simulation time and complexity, allowing portability, and scalability. However, trace-based simulation approaches have difficulty capturing and accurately replaying multithreaded traces due to the inherent nondeterminism in the execution of multithreaded programs. In this work, we present SynchroTrace, a scalable, flexible, and accurate trace-based multithreaded simulation methodology. By recording synchronization events relevant to modern threading libraries (e.g., Pthreads and OpenMP) and dependencies in the traces, independent of the host architecture, the methodology is able to accurately model the nondeterminism of multithreaded programs for different hardware platforms and threading paradigms. Through capturing high-level instruction categories, the SynchroTrace average CPI trace Replay timing model offers fast and accurate simulation of many-core in-order CMPs. We perform two case studies to validate the SynchroTrace simulation flow against the gem5 full-system simulator: (1) a constraint-based design space exploration with traditional CMP benchmarks and (2) a thread-scalability study with HPC-representative applications. The results from these case studies show that (1) our trace-based approach with trace filtering has a peak speedup of up to 18.7x over simulation in gem5 full-system with an average of 9.6x speedup, (2) SynchroTrace maintains the thread-scaling accuracy of gem5 and can efficiently scale up to 64 threads, and (3) SynchroTrace can trace in one platform and model any platform in early stages of design.
Metrics
Details
- Title
- SynchroTrace: Synchronization-Aware Architecture-Agnostic Traces for Lightweight Multicore Simulation of CMP and HPC Workloads
- Creators
- Karthik Sangaiah - Drexel UniversityMichael Lui - Drexel UniversityRadhika Jagtap - ARMStephan Diestelhorst - ARMSiddharth Nilakantan - NvidiaAnkit More - IntelBaris Taskin - Drexel UniversityMark Hempstead - Tufts University
- Publication Details
- ACM transactions on architecture and code optimization, v 15(1)
- Publisher
- Assoc Computing Machinery
- Number of pages
- 26
- Grant note
- 1002809 / NSF Graduate Research Fellowship; National Science Foundation (NSF) CCF-1350624; ECCS-1232164 / National Science Foundation; National Science Foundation (NSF)
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Web of Science ID
- WOS:000430876700002
- Scopus ID
- 2-s2.0-85045203838
- Other Identifier
- 991019168524704721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Industry collaboration
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Hardware & Architecture
- Computer Science, Theory & Methods