Robotics scientists have developed a novel way to improve how robots interact with their environment.
Researchers from UC Berkeley, Stanford University and the University of Warsaw have developed a method that enables robots to enhance their decision-making processes by incorporating reasoning.
The method, called Embodied Chain-of-Thought Reasoning (ECoT), enables robots to think through tasks step by step and consider their surroundings before taking action.
As detailed in a newly published paper, ECoT is designed to boost a robot’s ability to handle new tasks and environments effectively. It also provides human operators with a way to correct behaviors by modifying a robot’s reasoning through natural language feedback.
Vision-language-action models (VLAs) have increasingly emerged as a powerful way to train a robot to perform an action.
They are designed to give a robot the ability to better understand the task it has been asked to perform. Google DeepMind researchers highlighted VLA’s potential in a study published in June 2023.
However, according to the researchers, VLAs typically learn from observations of actions without any intermediate reasoning, meaning they’re limited in their ability to handle complex, novel situations that require more thoughtful planning and adaptation.
The researchers sought to improve robotic reasoning by adding a foundation model to the equation. They developed a scalable pipeline for generating synthetic training data for ECoT, leveraging various foundation models to extract features from robot demonstrations in the Bridge V2 dataset.
They used a suite of foundation models in their project, using object detectors and vision-language models to create descriptions of the environment the robot was in, annotating information like objects.
They then used Google’s Gemini model to generate plans, subtasks and movement labels, combining the data previously gathered on objects in the scene as well as details on the robot's gripper position.
Dividing the process into submodules allowed a staggered, more methodical approach that enabled the robot to perform its task after thoroughly thinking it through.
The researchers also found that ECoT reasoning can transfer to other robot embodiments, allowing the policy to generalize its reasoning capabilities even to robots not seen during training.
The researchers demonstrated that ECoT increased the absolute success rate of OpenVLA, an open-source VLA, by 28% across challenging generalization tasks, without requiring additional robot training data.
The method isn’t without its flaws, however.
All reasoning steps are performed in the fixed order the researchers chose, which can limit the robot's adaptability and flexibility in dynamically changing environments.
The researchers noted that the small-scale project could be improved by using a larger dataset, which would enable ECoT to be applied to more robots.
In addition, the researchers considered the execution speed limiting and want to explore ways to optimize the control frequencies to enable faster operations.
Foundation models are becoming an increasing area of interest for robotics researchers, potentially enabling robots to perform general-purpose tasks.
A startup called Skild AI is hoping to turn this area of research into a way to bring down the cost of robotics training. Skild recently raised $300 million to fund its efforts, with its foundation model already being applied to automation solutions for visual inspection and patrolling tasks.
About the Author
You May Also Like