Logo image
SynchroTrace: Synchronization-Aware Architecture-Agnostic Traces for Lightweight Multicore Simulation of CMP and HPC Workloads
Journal article   Open access   Peer reviewed

SynchroTrace: Synchronization-Aware Architecture-Agnostic Traces for Lightweight Multicore Simulation of CMP and HPC Workloads

Karthik Sangaiah, Michael Lui, Radhika Jagtap, Stephan Diestelhorst, Siddharth Nilakantan, Ankit More, Baris Taskin and Mark Hempstead
ACM transactions on architecture and code optimization, v 15(1)
01 Apr 2018
url
https://doi.org/10.1145/3158642View
Published, Version of Record (VoR)Maybe Open Access (Publisher Bronze) Open

Abstract

Computer Science Computer Science, Hardware & Architecture Computer Science, Theory & Methods Science & Technology Technology
Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over execution-driven simulation, such as reducing simulation time and complexity, allowing portability, and scalability. However, trace-based simulation approaches have difficulty capturing and accurately replaying multithreaded traces due to the inherent nondeterminism in the execution of multithreaded programs. In this work, we present SynchroTrace, a scalable, flexible, and accurate trace-based multithreaded simulation methodology. By recording synchronization events relevant to modern threading libraries (e.g., Pthreads and OpenMP) and dependencies in the traces, independent of the host architecture, the methodology is able to accurately model the nondeterminism of multithreaded programs for different hardware platforms and threading paradigms. Through capturing high-level instruction categories, the SynchroTrace average CPI trace Replay timing model offers fast and accurate simulation of many-core in-order CMPs. We perform two case studies to validate the SynchroTrace simulation flow against the gem5 full-system simulator: (1) a constraint-based design space exploration with traditional CMP benchmarks and (2) a thread-scalability study with HPC-representative applications. The results from these case studies show that (1) our trace-based approach with trace filtering has a peak speedup of up to 18.7x over simulation in gem5 full-system with an average of 9.6x speedup, (2) SynchroTrace maintains the thread-scaling accuracy of gem5 and can efficiently scale up to 64 threads, and (3) SynchroTrace can trace in one platform and model any platform in early stages of design.

Metrics

4 Record Views
13 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Industry collaboration
Domestic collaboration
International collaboration
Web of Science research areas
Computer Science, Hardware & Architecture
Computer Science, Theory & Methods
Logo image