Logo image
Design Flow for Scheduling Spiking Deep Convolutional Neural Networks on Heterogeneous Neuromorphic System-on-chip
Journal article   Open access   Peer reviewed

Design Flow for Scheduling Spiking Deep Convolutional Neural Networks on Heterogeneous Neuromorphic System-on-chip

Anup Das
ACM transactions on embedded computing systems, v 24(3), pp 1-30
15 May 2025
url
https://doi.org/10.1145/3635032View
Published, Version of Record (VoR)Open Access via Drexel Libraries Read and Publish Program 2025CC BY V4.0 Open

Abstract

Bio-inspired approaches Computer systems organization Computing methodologies Embedded hardware Hardware Memory and dense storage Neural networks Software and its engineering System on a chip Compilers
Neuromorphic systems-on-chip (NSoCs) integrate CPU cores and neuromorphic hardware accelerators on the same chip. These platforms can execute spiking deep convolutional neural networks (SDCNNs) with a low energy footprint. Modern NSoCs are heterogeneous in terms of their computing, communication, and storage resources. This makes scheduling SDCNN operations a combinatorial problem of exploring an exponentially large state space in determining mapping, ordering, and timing of operations to achieve a target hardware performance, e.g., throughput. We propose a systematic design flow to schedule SDCNNs on an NSoC. Our scheduler, called SMART (SDCNN MApping, OrdeRing, and Timing), branches the combinatorial optimization problem into computationally relaxed sub-problems that generate fast solutions without significantly compromising the solution quality. SMART improves performance by efficiently incorporating the heterogeneity in computing, communication, and storage resources. SMART operates in four steps. First, it creates a self-timed execution schedule to map operations to compute resources, maximizing throughput. Second, it uses an optimization strategy to distribute activation and synaptic weights to storage resources, minimizing data communication-related overhead. Third, it constructs an inter-processor communication (IPC) graph with a transaction order for its communication actors. This transaction order is created using a transaction partial order algorithm, which minimizes contention on the shared communication resources. Finally, it schedules this IPC graph to hardware by overlapping communication with the computation, and leveraging operation, pipeline, and batch parallelism. We evaluate SMART using 10 representative image, object, and language-based SDCNNs. Results show that SMART increases throughput by an average 23%, compared to a state-of-the-art scheduler. SMART is implemented entirely in software as a compiler extension. It does not require any change in a neuromorphic hardware or its interface to CPUs. It improves throughput with only a marginal increase in the compilation time. SMART is released under the open-source MIT licensing at https://github.com/drexel-DISCO/SMART to foster future research.

Metrics

5 Record Views
1 citations in Scopus

Details

Logo image