Deploying neural networks in real-time, low-latency, and power-constrained environments, such as high-speed imaging systems, requires efficient hardware-aware machine learning workflows. This thesis presents a composable framework designed to automate and accelerate the design, evaluation, and deployment of optimized neural network models for FPGA-based inference. The architecture is implemented using a modular, containerized approach, where each stage of the machine learning lifecycle is encapsulated within dedicated Docker containers. These are orchestrated using Kubernetes, supporting horizontal scalability and efficient parallel experimentation across diverse environments. The pipeline integrates QKeras for model pruning and precision tuning, HLS4ML for high-level synthesis (HLS), and Vivado for hardware compilation and resource reporting. The system aims to minimize latency and power consumption while maintaining accuracy, making it ideal for real-time applications. A specific use-case container, referred to as the "Fire Over Many Overlaps (FOMO) Frame Grabber", shows the pipeline's practical application. This use case utilizes automated Neural Architecture Search (NAS) for hyperparameter optimization, along with quantization-aware training, pruning, and HLS compilation. To address the challenge of tracking, monitoring, and reusing experimental results, the pipeline integrates DataFed for structured metadata logging and experiment management. It captures and stores training metrics, model artifacts, Vivado reports, and environment configurations. A Globus-enabled container ensures seamless and secure data transfer to remote DataFed storage, promoting long-term reproducibility and enabling intelligent, data-driven NAS by building upon prior runs. This composable machine learning pipeline streamlines deployment, reduces manual intervention, and accelerates design spaces exploration for low-power, low-latency AI applications. Its scalable and reproducible design offers a solution for both academic research and industrial use cases requiring real-time, hardware-efficient inference.
Metrics
26 File views/ downloads
26 Record Views
Details
Title
Composable machine learning pipeline for fast deployment with neural architecture search
Creators
Veronica Obute
Contributors
Joshua Agar (Advisor)
Awarding Institution
Drexel University
Degree Awarded
Master of Science (M.S.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Number of pages
xi, 38 pages
Resource Type
Thesis
Language
English
Academic Unit
College of Engineering (1970-2026); Electrical (and Computer) Engineering [Historical]; Drexel University