![Robot hands typing on a keyboard Robot hands typing on a keyboard](https://eu-images.contentstack.com/v3/assets/blt31d6b0704ba96e9d/bltb92e547c83863192/66c867c000c942c8dcafe64e/Getty_robot_hand_keyboard.png?width=1280&auto=webp&quality=95&format=jpg&disable=upscale)
Open-source code repository Hugging Face has launched a foundational AI model for robots that translates natural language commands into physical actions.
Called Pi0, the model was developed by AI-robotics startup Physical Intelligence and ported to Hugging Face’s LeRobot platform.
Principal research scientist at Hugging Face Remi Cadene said on X that the model is the most advanced vision language action model available.
“It takes natural language commands as input and directly outputs autonomous behavior,” he said.
Pi0 can control a variety of different robots, and either be prompted to conduct a specific task or trained to specialize in more challenging scenarios. It can also be fine-tuned on a person’s or company’s own data set.
Physical Intelligence trained the model on data from seven robotic platforms and on 68 unique tasks previously considered too complex for robots, including folding laundry, waiting on tables and packing groceries.
In a blog post, the company said Pi0, which was developed over eight months, is a first step toward artificial physical intelligence that could enable users to simply ask robots to perform any task they want, as they currently do with large language models (LLMs) and chatbot assistants.
“Folding a shirt or cleaning up a table requires solving some of the most difficult engineering problems ever conceived,” it said.
“Like LLMs, our model is trained on broad and diverse data and can follow various text instructions. Unlike LLMs, it spans images, text, and actions and acquires physical intelligence by training on embodied experience from robots, learning to directly output low-level motor commands via a novel architecture.”
Also in a blog, Hugging Face engineers said the new model brings generalist robotic intelligence to the Hugging Face ecosystem and is the first time a foundation model for robots is widely available through an open-source platform.
Shawn DuBravac, CEO and president of Avrio Institute, which works with companies to understand technological shifts, said Pi0 has the potential to lower the barriers to robotics adoption by reducing time and costs involved and by enabling non-programmers to direct robots with natural language in place of coding commands.
“It could also change the type of robots organizations deploy,” DuBravac said. “Rather than designing robots for a single, specific use case, organizations could build and deploy generalist robots that can be given new tasks for different settings with minimal reprogramming.”
Building on the foundation model, Physical Intelligence has also launched Pio-Fast, an enhanced version that incorporates a tokenization scheme called frequency-space action sequence tokenization. The company said it trains five times faster and shows improved generalization across different environments and robot types.
This article first appeared in IoT World Today's sister publication AI Business.
About the Author
You May Also Like