Optimizing neuromorphic hardware with co-design for power and performance gains

Lakshmi Varshika Mirtinti

doi:10.17918/00011262

The pursuit to build human-like artificial intelligence has resulted in the proliferation of machine learning models of escalating size and complexity, with stringent constraints on power consumption and memory bandwidth. While conventional GPU architectures remain irreplaceable for supporting large language models but the increased cost of modern GPU's have lead to the demand for specialized, custom hardware accelerators that can support real-world applications. Neuromorphic computing, leveraging spiking neural networks (SNNs) and event-driven processing, presents a compelling solution by enabling an energy-efficient computational paradigm. The inherent sparsity of input spike data facilitates highly scalable architectures, significantly reducing computational overhead and data movement costs. Hardware-software co-design is a methodology that emphasizes the simultaneous and integrated development of hardware and software components. Rather than treating hardware and software as separate stages in the development process, co-design optimizes their interaction from the outset, ensuring efficient system-level performance. This work explores SNN-based accelerators as a means to address power and performance challenges through hardware-software co-design techniques for energy-efficient machine learning hardware. The design of many-core neuromorphic hardware has become increasingly complex due to the need to integrate multiple processing systems within each computing core. To address this complexity, we propose a Synchronous Data Flow Graph (SDFG)-based design flow for mapping SNNs onto hardware, with a focus on optimizing the trade-off between throughput and buffer size. The proposed design flow integrates the Kernighan-Lin graph partitioning heuristic with a cluster mapping algorithm to enhance core utilization. Traditionally, neuromorphic cores are scaled up to support large-scale spike-based deep convolutional neural network (SDCNN) inference. However, this approach significantly increases static power consumption and chip area. To mitigate these issues, we propose a heterogeneous digital many-core architecture with varying core capacities, complemented by a system software framework. This framework compiles and schedules sub-networks onto the many-core hardware, optimizing energy consumption and latency. Additionally, we enhance spike-based convolutional neural networks (CNNs) by enabling on-chip learning in convolution layers. This allows each layer to dynamically learn feature representations by integrating extracted features from previous layers. To support this capability, we propose a generalized tile-based neuromorphic hardware design incorporating Spike Timing Dependent Plasticity (STDP) for on-chip learning. Our co-design approach iteratively integrates hardware architecture and software optimization, ensuring that the resulting neuromorphic platform meets both performance and resource constraints. The second part of this thesis transitions from architectural design principles to the application of these concepts, focusing on on use-case defined building block for edge AI applications. Our multi-core platform of heterogeneous neuromorphic chips is used to deploy generative edge applications efficiently. A key component of this effort involves rigorously benchmarking spiking-based generative Encoder-Decoder model architectures on the hardware accelerator. This benchmarking includes simultaneous optimization across two critical, interwoven dimensions: maximizing event-driven sparsity to minimize energy consumption, and achieving superior results across user-case driven performance metrics such as latency, throughput, and accuracy. To expand the functional scope of the platform, the underlying hardware accelerator is subsequently extended to natively support recurrent neurons with integrated on-the-fly Hebbian learning capabilities. Spiking recurrent networks are feedback driven learning systems and more biological plausibile. We introduce a novel hybrid co-design implementation that capitalizes on both conventional pre-training and on-chip adaptation. This enhances inherent responsiveness and performance by utilizing stored memory state of the recurrent layer. This work pushes boundries of neuromorphic computing by developing hardware-software co-design techniques that fully exploit the advantages of energy-efficient system architectures for next-generation SNN-based accelerators.

Optimizing neuromorphic hardware with co-design for power and performance gains

Files and links (1)

Abstract

Metrics

Details

Optimizing neuromorphic hardware with co-design for power and performance gains

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media