Researchers Use GPT-4 to Control Humanoid Robots with Natural Language

Japanese researchers showcase OpenAI model converting text prompts into robot actions, to streamline training

Ben Wodecki, Junior Editor - AI Business

July 5, 2024

3 Min Read
A humanoid robot playing the air guitar
Alter3 playing the air guitar after being instructed to perform metal musicUniversity of Tokyo, Alternative Machine

Japanese robotics researchers have showcased OpenAI’s GPT-4 model turning natural language inputs into commands for humanoid robots.

Results of newly published research from the University of Tokyo and Alternative Machine showed the foundation model applied to a humanoid robot, Alter3.

The model converted text prompts into actions for the robot to perform, such as “take a selfie with your phone.”

The model uses the initial prompt to create a series of movements for the robot to execute to perform a task.

The list of movements is then translated into code, which is then input into Alter3 so that it can complete the task.

Robotics researchers have increasingly turned to language models to improve robotics training.

MIT researchers recently developed a framework using language models to provide robots with” common sense.” Another MIT paper suggests a language-based system could help robots better navigate their surrounding environments.

The Japanese researchers sought to simplify robotic training, an often laborious task that takes many hours and vast quantities of data to help the robot understand the task it has to perform.

This new foundational model-led approach, however, could enable robotics developers to train units faster.

The researchers wrote that before using the foundation model, they had to control all 43 axes in a certain order to mimic a person's pose or to pretend a behavior such as serving tea or playing chess. 

Related:ChatGPT Gives Humanoid Robot Ability to Speak in Multiple Languages

OpenAI’s model is not natively tuned for robotic backends. Instead, the researchers employed in-context learning to adapt the model to generate actions based on linguistic expressions into code. 

The model can generate a list of general actions for the robot to perform, rather than an individual list for each of the bot’s body parts. 

Users can adjust the actions they want the robot to perform using natural language, such as asking it to raise its arm more when taking a selfie.

The researchers found that motion instructions generated by GPT-4 were of a higher quality than those created using traditional robotic training techniques. 

The model enabled Alter3 to perform non-human actions, such as pretending to be a ghost or a snake, by leveraging GPT-4’s extensive knowledge base to understand how to interpret those actions how a human might.

The researchers said that the results showed that OpenAI’s foundation model can generate a wide range of movements, from everyday actions to imitating non-human movements.

The model could even enable humanoid robots to better express emotional responses The researchers said that even using prompt text where emotional expressions were not explicitly stated, the foundation model could infer adequate emotions and reflect them in Alter3's physical responses.

Related:Nvidia Releases GPT-Powered Robot Training Software

“This integration of verbal and non-verbal communication can enhance the potential for more nuanced and empathetic interactions with human[s],” they wrote.

About the Author(s)

Ben Wodecki

Junior Editor - AI Business

Ben Wodecki is the junior editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to junior editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others.

Sign Up for the Newsletter
The most up-to-date news and insights into the latest emerging technologies ... delivered right to your inbox!

You May Also Like