Composable machine learning pipeline for fast deployment with neural architecture search

Veronica Obute

doi:10.17918/00011069

Back

Composable machine learning pipeline for fast deployment with neural architecture search

Thesis

Open access

Composable machine learning pipeline for fast deployment with neural architecture search

Veronica Obute

Master of Science (M.S.), Drexel University

Jun 2025

DOI:

https://doi.org/10.17918/00011069

Files and links (1)

pdf

Obute_Veronica_20251.38 MBDownload View

PDFOpen Access (License Unspecified), Open Access

Abstract

Deploying neural networks in real-time, low-latency, and power-constrained environments, such as high-speed imaging systems, requires efficient hardware-aware machine learning workflows. This thesis presents a composable framework designed to automate and accelerate the design, evaluation, and deployment of optimized neural network models for FPGA-based inference. The architecture is implemented using a modular, containerized approach, where each stage of the machine learning lifecycle is encapsulated within dedicated Docker containers. These are orchestrated using Kubernetes, supporting horizontal scalability and efficient parallel experimentation across diverse environments. The pipeline integrates QKeras for model pruning and precision tuning, HLS4ML for high-level synthesis (HLS), and Vivado for hardware compilation and resource reporting. The system aims to minimize latency and power consumption while maintaining accuracy, making it ideal for real-time applications. A specific use-case container, referred to as the "Fire Over Many Overlaps (FOMO) Frame Grabber", shows the pipeline's practical application. This use case utilizes automated Neural Architecture Search (NAS) for hyperparameter optimization, along with quantization-aware training, pruning, and HLS compilation. To address the challenge of tracking, monitoring, and reusing experimental results, the pipeline integrates DataFed for structured metadata logging and experiment management. It captures and stores training metrics, model artifacts, Vivado reports, and environment configurations. A Globus-enabled container ensures seamless and secure data transfer to remote DataFed storage, promoting long-term reproducibility and enabling intelligent, data-driven NAS by building upon prior runs. This composable machine learning pipeline streamlines deployment, reduces manual intervention, and accelerates design spaces exploration for low-power, low-latency AI applications. Its scalable and reproducible design offers a solution for both academic research and industrial use cases requiring real-time, hardware-efficient inference.

Metrics

26 File views/ downloads

26 Record Views

Details

Title: Composable machine learning pipeline for fast deployment with neural architecture search
Creators: Veronica Obute
Contributors: Joshua Agar (Advisor)
Awarding Institution: Drexel University
Degree Awarded: Master of Science (M.S.)
Publisher: Drexel University; Philadelphia, Pennsylvania
Number of pages: xi, 38 pages
Resource Type: Thesis
Language: English
Academic Unit: College of Engineering (1970-2026); Electrical (and Computer) Engineering [Historical]; Drexel University
Other Identifier: 991022058732804721

Composable machine learning pipeline for fast deployment with neural architecture search

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media