Papers
arxiv:2405.06039

Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations

Published on May 9
Authors:
,
,
,
,

Abstract

This research introduces the Bi-VLA (Vision-Language-Action) model, a novel system designed for bimanual robotic dexterous manipulations that seamlessly integrate vision, language understanding, and physical action. The system's functionality was evaluated through a set of household tasks, including the preparation of a desired salad upon human request. Bi-VLA demonstrates the ability to interpret complex human instructions, perceive and understand the visual context of ingredients, and execute precise bimanual actions to assemble the requested salad. Through a series of experiments, we evaluate the system's performance in terms of accuracy, efficiency, and adaptability to various salad recipes and human preferences. Our results indicate a high success rate of 100% in generating the correct executable code by the Language module from the user-requested tasks. The Vision Module achieved a success rate of 96.06% in detecting specific ingredients and an 83.4% success rate in detecting a list of multiple ingredients.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2405.06039 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.06039 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2405.06039 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.