Design methodologies for reliable and high-performance NVM systems

Shihao Song

doi:10.17918/00000679

Today's mainstream computing systems follow the design of von Neumann architecture, where computing devices, e.g., CPU, are separated from memories. Those computing devices have relied on the principles of Dennard and Moore's laws to scale up the performance. In contrast, memory devices have significantly fallen behind their scaling capabilities, leading to a large performance gap between memory and computing devices. As such, modern workloads that exhibit large memory footprints have been bottlenecked by data movement. This brings up two motivation aspects: (1) finding new memory technologies that have excellent scaling capabilities; (2) re-thinking the design of conventional von Neumann architecture. Dynamic random access memory (DRAM) technology has long been the chosen technology to implement main memory. However, DRAM is experiencing significant scalability issues. Emerging NVM technologies with excellent scalability and near-DRAM access latency are proposed to supplant or supplement DRAM for emerging high-performance and high-capacity main memory systems. Additionally, in the neuromorphic computing paradigm, where memory needs to be tightly coupled with computing elements to overcome the memory bandwidth bottleneck, NVM technologies show promising potential for such integration. NVMs such as phase-change memory (PCM) require high voltages to operate. This indicates that (1) NVM-based hardware needs extra latency to charge to the required voltage levels, limiting overall hardware performance. (2) The elevated voltages accelerate agings of CMOS devices inside the hardware, leading to hard or soft faults. This further impacts the lifetime reliability of NVM-based hardware. (3) High voltages also exacerbate read disturbance, e.g., the content in an NVM cell may get destroyed after performing a certain number of reads. As such, this thesis proposes (1) performance-driven design methodologies for NVM-based systems, e.g., mitigating latency overhead induced by charging; (2) reliability-oriented design methodologies to mitigate aging and read disturbance induced by high operating voltages. For the conventional computing paradigm, a tiered hybrid DRAM-NVM architecture is first proposed as a part of the system-level optimization for conventional computing systems. One essential contribution is splitting an NVM into two segments so that one segment can operate under a lower voltage level, leading to reduced charging time. In addition, lower voltages can mitigate aging and improve NVM cell endurance. To efficiently utilize the lower-voltage segment, a prediction-based data placement and migration policy is also proposed and integrated into the operating system (OS) design, e.g., memory management unit. By predicting access-intensive (hot) data region via run-time memory access pattern characterization, the policy places or migrates hot data into the lower-voltage segment, improving overall system performance and reliability. An aging-aware memory controller design is then proposed to mitigate performance degradation and aging for computing systems with hybrid DRAM-NVM main memory subsystems. Fundamental to this design is a novel, accurate analytical model to estimate NVM aging at run-time. The model predicts the aging of an NVM memory bank, incorporating not only its dependency on the operating voltage but also its dependency on the number of requests the bank serves. The proposed model is integrated into the request scheduler of a memory controller that controls the destress of an NVM memory bank, e.g., discharging the bank to mitigate aging, with little to no distraction to the bank's service for memory requests. The request scheduler also prioritizes memory requests to already charging banks, eliminating pre-charging latency, improving performance. For the neuromorphic computing paradigm, a dataflow-based synthesizer is first proposed for design-space exploration specific to NVM-enabled neuromorphic hardware. Synchronous data flow graphs (SDFGs) are used to model machine learning applications running on neuromorphic hardware. SDFG representation allows for performance analysis in terms of key hardware constraints such as the number and computation capability of processing cores, storage, and inter-core communication bandwidth. The synthesizer is also integrated with a novel scheduling algorithm based on self-timed execution, which minimizes both the schedule storage overhead and run-time schedule construction overhead. The synthesizer enables flexible configurations to simulate rich collections of state-of-the-art neuromorphic hardware. In addition, the synthesizer empowers application-to-hardware mapping exploration to investigate and address performance and reliability degradation found in NVM-enabled neuromorphic hardware. A reliability-oriented mapping technique is then proposed for NVM-enabled neuromorphic hardware to mitigate aging due to high operating voltages without compromising key performance metrics such as execution time of these applications on the hardware. Fundamental to this approach is a novel formulation of the aging of CMOS-based circuits in neuromorphic hardware considering different failure mechanisms. The mapping technique incorporating the formulation then uses an instance of Particle Swarm Optimization (PSO) to generate mappings that are Pareto-optimal in terms of performance and reliability. Finally, an architectural solution is proposed to extend the read endurance of NVM-based, specifically, Oxide-based Resistive RAM (RRAM)-based neuromorphic systems. Core of this solution is a novel formulation of the read endurance of an RRAM cell as a function of the programmed synaptic weight and its activation within a machine learning workload. An intelligent workload mapping strategy is then proposed incorporating the endurance formulation to place the synapses of a machine learning model onto the RRAM cells of the hardware. The objective is to extend the inference lifetime, defined as the number of times the machine learning model can be used to generate output (inference) before the trained weights need to be reprogrammed on the RRAM cells of the system.

Design methodologies for reliable and high-performance NVM systems

Files and links (1)

Abstract

Metrics

Details

Design methodologies for reliable and high-performance NVM systems

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media