VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

Zijian An; Hadi Khezam; Bill Cai; Ran Yang; Shijie Geng; Yiming Feng; Yue (Luna) Zhang; Lifeng Zhou

doi:10.48550/arxiv.2605.02037

Back

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

Preprint

Open access

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

Zijian An, Hadi Khezam, Bill Cai, Ran Yang, Shijie Geng, Yiming Feng, Yue (Luna) Zhang and Lifeng Zhou

ArXiv.org

03 May 2026

DOI: https://doi.org/10.48550/arxiv.2605.02037

Files and links (1)

url

https://doi.org/10.48550/arXiv.2605.02037View

Preprint (Author's original) Open CC BY V4.0

Abstract

Computer Science - Artificial Intelligence

Computer Science - Robotics

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi₀, pi₀.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.

Metrics

1 Record Views

Details

Title: VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation
Creators: Zijian An - Drexel University
Hadi Khezam - Drexel University
Bill Cai - Drexel University
Ran Yang - Virginia Tech
Shijie Geng - Amazon (United States)
Yiming Feng - Virginia Tech
Yue (Luna) Zhang - Drexel University
Lifeng Zhou - Drexel University
Publication Details: ArXiv.org
Resource Type: Preprint
Language: English
Academic Unit: Electrical and Computer Engineering
Other Identifier: 991022179573104721

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media