Dissertation
Efficient scaling of out-of-order processor resources
Doctor of Philosophy (Ph.D.), Drexel University
01 Jun 2015
DOI:
https://doi.org/10.17918/etd-6322
Abstract
Rather than improving single-threaded performance, with the dawn of the multi-core era, processor microarchitects have exploited Moore's law transistor scaling by increasing core density on a chip and increasing the number of thread contexts within a core. However, single-thread performance and efficiency is still very relevant in the power-constrained multi-core era, as increasing core counts do not yield corresponding performance improvements under real thermal and thread-level constraints. This dissertation provides a detailed study of register reference count structures and its application to both conventional and non-conventional, latency tolerant, out-of-order processors. Prior work has incorporated reference counting, but without a detailed implementation or energy model. This dissertation presents a working implementation of reference count structures and shows the overheads are low and can be recouped by the techniques enabled in high-performance out-of-order processors. A study of register allocation algorithms exploits register file occupancy to reduce power consumption by dynamically resizing the register file, which is especially important in the face of wider multi-threaded processors who require larger register files. Latency tolerance has been introduced as a technique to improve single threaded performance by removing cache-miss dependent instructions from the execution pipeline until the miss returns. This dissertation introduces a microarchitecture with a predictive approach to identify long-latency loads, and reduce the energy cost and overhead of scaling the instruction window inherent in latency tolerant microarchitectures. The key features include a front-end predictive slice-out mechanism and in-order queue structure along with mechanisms to reduce the energy cost and register-file usage of executing instructions. Cycle-level simulation shows improved performance and reduced energy delay for memory-bound workloads. Both techniques scale processor resources, addressing register file inefficiency and the allocation of processor resources to instructions during low ILP regions.
Metrics
55 File views/ downloads
34 Record Views
Details
- Title
- Efficient scaling of out-of-order processor resources
- Creators
- Steven James Battle - DU
- Contributors
- Mark D. Hempstead (Advisor) - Drexel University (1970-)
- Awarding Institution
- Drexel University
- Degree Awarded
- Doctor of Philosophy (Ph.D.)
- Publisher
- Drexel University; Philadelphia, Pennsylvania
- Resource Type
- Dissertation
- Language
- English
- Academic Unit
- College of Engineering (1970-2026); Electrical (and Computer) Engineering (1970-2026); Drexel University
- Other Identifier
- 6322; 991014632439804721