Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations

Ran Yang; Siwei Cai; Lifeng Zhou; Yiming Feng

doi:10.1016/j.jfutfo.2025.12.047

Back

Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations

Journal article

Open access

Peer reviewed

Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations

Ran Yang, Siwei Cai, Lifeng Zhou and Yiming Feng

Journal of future foods, Forthcoming

Jan 2026

DOI: https://doi.org/10.1016/j.jfutfo.2025.12.047

Featured in Collection : Drexel's Newest Publications

Files and links (1)

url

https://doi.org/10.1016/j.jfutfo.2025.12.047View

Published, Version of Record (VoR) Open

Abstract

Automated food handling

Embodied AI

Low-cost food automation

Small-scale food processing

Vision-Language-Action (VLA) models

Weight-controlled portioning

•Developed intelligent food portioning system for small-scale operations•π0 model achieved 100% success rate with only 30-50 training demonstrations•System adapts across diverse food types: shrimp, grapes, and garlic cloves•Vision-language-action models integrated with real-time weight monitoring•Efficient operation: 15.23 seconds to portion 30g of shrimp with high accuracy The food industry faces significant barriers to adopting automation, as most of the food service, retail, and processing operations are small businesses that lack the financial capacity to invest in conventional industrial automation systems. Food portioning is a fundamental operation across food industry sectors, yet it remains highly labor-intensive for small businesses. Existing automated portioning systems are generally designed for single-product, large-scale processing, rendering them financially prohibitive and operationally inflexible for small-scale operations with diverse product requirements. Advancements in artificial intelligence (AI) provide promising avenues for the development of cost-effective automation systems for small food businesses, offering adaptable solutions capable of handling multiple food types with flexibility and precision. This study proposes an AI-driven, low-cost food portioning framework as a proof-of-concept solution that integrates weight sensing with vision-language-action (VLA) control to enable adaptable handling of diverse food products. The system employs You-Only-Look-Once (YOLO)-based vision models to interpret digital scale readings while coordinating robotic picking mechanisms that transfer food items until the target weight is reached. Three vision-language models, namely Action Chunking with Transformers (ACT), OpenVLA with Optimized Fine-Tuning (OpenVLA-OFT), and π0, were evaluated on shrimp (30g), grapes (50g), and garlic (20g), demonstrating adaptability across diverse food types. The π0 model achieved a 100% success rate using only 30–50 demonstrations per food type and demonstrated efficient operational performance (e.g., 15.23 seconds to portion 30 g of shrimp). This framework demonstrates the potential for adaptive automation in small-scale food businesses, providing a preliminary foundation that addresses single-product automation limitations in food packaging, distribution and service operations. [Display omitted]

Metrics

1 Record Views

Details

Title: Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations
Creators: Ran Yang - Virginia Sea Grant
Siwei Cai - Drexel University
Lifeng Zhou - Drexel University
Yiming Feng (Corresponding Author) - Virginia Tech
Publication Details: Journal of future foods, Forthcoming
Publisher: Elsevier
Resource Type: Journal article
Language: English
Academic Unit: Electrical and Computer Engineering
Other Identifier: 991022154811804721

Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media