MIT researchers have developed a framework that uses large language models to provide robots with common sense.
Engineers from MIT's Computer Science & AI Laboratory (CSAIL) developed the Grounding Language in DEmonstrations (Glide) framework which enables information generated by large language models to be used as a foundation for robotic manipulation tasks providing them with a basic understanding and how to interact with it.
Traditionally, a human programmer would manually label and assign actions for a robot to perform or skills are taught via human demonstrations.
The Glide framework automates this process. A human prompts a large language model to turn high-level instructions “into a step-by-step abstract plan in language.”
The model's basic blueprint outlines important details about what's happening in a scene, like specific locations or important features. This helps the robot make sense of its surroundings.
The model then links these understandings to the robot's abilities, outlining both successful ways to do tasks and potential mistakes to avoid.
The information gives the robot the necessary information and context to understand the limitations and requirements involved in completing a task.
The researchers tested the framework on a robotic arm system attempting to pick up materials off a table.
Glide is not without its limitations. According to a paper detailing the framework, it requires many trial-and-error attempts in a resettable environment to gather information on successfully completing a task's steps.
This article first appeared in IoT World Today's sister publication AI Business.
About the Author
You May Also Like