Efficient scaling of out-of-order processor resources

Steven James Battle

doi:10.17918/etd-6322

Back

Efficient scaling of out-of-order processor resources

Dissertation

Open access

Efficient scaling of out-of-order processor resources

Steven James Battle

Doctor of Philosophy (Ph.D.), Drexel University

01 Jun 2015

DOI:

https://doi.org/10.17918/etd-6322

Files and links (1)

pdf

Battle_Steven_20151.83 MBDownload View

PDF Open Access Open Access (License Unspecified)

Abstract

Low voltage integrated circuits

Computer Engineering

Computer Science

Rather than improving single-threaded performance, with the dawn of the multi-core era, processor microarchitects have exploited Moore's law transistor scaling by increasing core density on a chip and increasing the number of thread contexts within a core. However, single-thread performance and efficiency is still very relevant in the power-constrained multi-core era, as increasing core counts do not yield corresponding performance improvements under real thermal and thread-level constraints. This dissertation provides a detailed study of register reference count structures and its application to both conventional and non-conventional, latency tolerant, out-of-order processors. Prior work has incorporated reference counting, but without a detailed implementation or energy model. This dissertation presents a working implementation of reference count structures and shows the overheads are low and can be recouped by the techniques enabled in high-performance out-of-order processors. A study of register allocation algorithms exploits register file occupancy to reduce power consumption by dynamically resizing the register file, which is especially important in the face of wider multi-threaded processors who require larger register files. Latency tolerance has been introduced as a technique to improve single threaded performance by removing cache-miss dependent instructions from the execution pipeline until the miss returns. This dissertation introduces a microarchitecture with a predictive approach to identify long-latency loads, and reduce the energy cost and overhead of scaling the instruction window inherent in latency tolerant microarchitectures. The key features include a front-end predictive slice-out mechanism and in-order queue structure along with mechanisms to reduce the energy cost and register-file usage of executing instructions. Cycle-level simulation shows improved performance and reduced energy delay for memory-bound workloads. Both techniques scale processor resources, addressing register file inefficiency and the allocation of processor resources to instructions during low ILP regions.

Metrics

55 File views/ downloads

34 Record Views

Details

Title: Efficient scaling of out-of-order processor resources
Creators: Steven James Battle - DU
Contributors: Mark D. Hempstead (Advisor) - Drexel University (1970-)
Awarding Institution: Drexel University
Degree Awarded: Doctor of Philosophy (Ph.D.)
Publisher: Drexel University; Philadelphia, Pennsylvania
Resource Type: Dissertation
Language: English
Academic Unit: College of Engineering (1970-2026); Electrical (and Computer) Engineering (1970-2026); Drexel University
Other Identifier: 6322; 991014632439804721

Efficient scaling of out-of-order processor resources

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media