Journal article
Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations
Journal of future foods, Forthcoming
Jan 2026
Featured in Collection : Drexel's Newest Publications
Abstract
•Developed intelligent food portioning system for small-scale operations•π0 model achieved 100% success rate with only 30-50 training demonstrations•System adapts across diverse food types: shrimp, grapes, and garlic cloves•Vision-language-action models integrated with real-time weight monitoring•Efficient operation: 15.23 seconds to portion 30g of shrimp with high accuracy
The food industry faces significant barriers to adopting automation, as most of the food service, retail, and processing operations are small businesses that lack the financial capacity to invest in conventional industrial automation systems. Food portioning is a fundamental operation across food industry sectors, yet it remains highly labor-intensive for small businesses. Existing automated portioning systems are generally designed for single-product, large-scale processing, rendering them financially prohibitive and operationally inflexible for small-scale operations with diverse product requirements. Advancements in artificial intelligence (AI) provide promising avenues for the development of cost-effective automation systems for small food businesses, offering adaptable solutions capable of handling multiple food types with flexibility and precision. This study proposes an AI-driven, low-cost food portioning framework as a proof-of-concept solution that integrates weight sensing with vision-language-action (VLA) control to enable adaptable handling of diverse food products. The system employs You-Only-Look-Once (YOLO)-based vision models to interpret digital scale readings while coordinating robotic picking mechanisms that transfer food items until the target weight is reached. Three vision-language models, namely Action Chunking with Transformers (ACT), OpenVLA with Optimized Fine-Tuning (OpenVLA-OFT), and π0, were evaluated on shrimp (30g), grapes (50g), and garlic (20g), demonstrating adaptability across diverse food types. The π0 model achieved a 100% success rate using only 30–50 demonstrations per food type and demonstrated efficient operational performance (e.g., 15.23 seconds to portion 30 g of shrimp). This framework demonstrates the potential for adaptive automation in small-scale food businesses, providing a preliminary foundation that addresses single-product automation limitations in food packaging, distribution and service operations.
[Display omitted]
Metrics
1 Record Views
Details
- Title
- Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations
- Creators
- Ran Yang - Virginia Sea GrantSiwei Cai - Drexel UniversityLifeng Zhou - Drexel UniversityYiming Feng (Corresponding Author) - Virginia Tech
- Publication Details
- Journal of future foods, Forthcoming
- Publisher
- Elsevier
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Electrical and Computer Engineering
- Other Identifier
- 991022154811804721