Logo image
Dynamic inter-node ILP based multi-scratchpad management for deep learning accelerators
Thesis   Open access

Dynamic inter-node ILP based multi-scratchpad management for deep learning accelerators

Tajung Jang
Master of Science (M.S.), Drexel University
02 Aug 2023
DOI:
https://doi.org/10.17918/00001796
pdf
Jang_Tajung_20231.44 MBDownloadView

Abstract

Deep learning (Machine learning) Integer Linear Programming Scratchpad Scratchpad management Integer Programming
The landscape of deep learning compiler frameworks has evolved rapidly with the development of various tools, such as TVM, deeptools, TensorFlow, DLVM, nGraph, and Glow. These frameworks offer unique optimizations to address computation and data movement challenges in deep learning accelerators (DLAs). These approaches include graph or IR level optimizations related to intra node memory access optimizations, operator fusion, and various tiling techniques. Despite their unique approaches, these frameworks primarily concentrate on node level optimizations that focus on increasing the performance of executing a scheduled kernel operation in the graph and overlook the potential for inter-node data reuse optimizations within on-chip memory resources. OnSRAM, a scratchpad management framework build to work with deep learning compilers, addresses this gap by focusing on internode scratchpad management in DLAs. OnSRAM exploits the static graph representations of deep learning models by identifying data structures that can be pinned to on-chip memory based on their reuse rate and cost of transfer from main memory. OnSRAM has been implemented and evaluated on a single DLA that contains a monolithic scratchpad and is integrated as part of a custom deep learning compiler framework. In this work, we extend the capabilities of OnSRAM by introducing an optimal dynamic scratchpad allocation for static graph execution models using any number of scratchpads via Integer Linear Programming (ILP) to optimize an accurate cost model of data transfers. This enhancement allows for more wholistic control over on- chip memory resources compared to the heuristic approach OnSRAM takes, providing increased flexibility and adaptability to better accommodate diverse deep learning accelerators and memory access patterns. By optimizing inter-node data movement and storage across multiple scratchpads, our approach further reduces energy consumption and latency associated with inter-node communication.

Metrics

48 File views/ downloads
30 Record Views

Details

Logo image