June 12, 2024
OpenAI's chief architect, Colin Jarvis, predicted significant advancements in large language models during his keynote address at AI Summit London on Wednesday.
Jarvis highlighted four key areas where he expects major progress: Smarter and cheaper models, increased model customization, more multimodality like audio and video and market-leading chatbots performing at similarly high levels.
"Don't build for what's available today because things are changing so fast," Jarvis told attendees, saying the speed of advancement means current capabilities will be outmoded by the time new applications ship.
He urged companies to differentiate by using language AI APIs and creating unique user experiences, data approaches and model customizations.
Jarvis said the key differentiator for businesses building language model-powered services is leveraging your own proprietary data.
“The user experience you create, the data you bring to the model and how you customize it and the like service that you expose to the model, that is actually where you folks are going to differentiate and build something like genuinely unique,” Jarvis said. “If you just build a wrapper around one of these very useful models, then you're no different than your competitors.”
Jarvis said that use cases and user experiences previously cast aside by businesses due to cost or complexity can now be put into action due to reduced operating costs and smarter models.
For example, he highlighted OpenAI’s model embedding costs, describing them as “basically free” – adding that use cases previously out of bounds because of cost or latency can now be put into deployment.
Photo Credit: Ben Wodecki
“With GPT-4o coming out with that’s twice as fast as GPT-4, we saw a lot of use cases that were painfully slow for users actually just drop under that threshold where you're happy to ship at that stage,” he said.
“What we've seen in the last year confirms that firstly models get smarter, then they get cheaper and faster. We've got smarter models, but then we can also serve them for cheap work.”
The Chatbot Arms Race
ChatGPT was released in late 2022 – but the chatbot market is becoming increasingly crowded with rivals such as Gemini from Google and Claude from Anthropic.
Jarvis described the field as an “arms race” highlighting that the top text-focused chatbots boast similar levels of intelligence.
Photo Credit: Ben Wodecki
He said the market’s diverse range of high-performing models will continue to happen, with every provider looking to one-up each other, pushing their bot’s performance levels by a few percentage points.
“The thing that will be interesting to see over the next year is whether somebody manages to make another GPT-3 to GPTG-4 jump in terms of the capabilities of these models, would expect to see this to continue, with more providers and a more fragmented, diverse market,” he said.
Increased Model Customization
Traditionally, businesses would take a foundation model and then fine-tune it to their use case or application.
However, language models are limited in how much they can be fine-tuned and building atop an open source model requires considerable technical skills and computational workloads.
Jarvis forecasts that businesses will increasingly look to take a base model and then post-train through reinforcement learning, for it to become an expert in a relevant field or subject.
“That will bring with itself a lot of safety concerns, but it will also bring with it a lot of really cool use cases where you could make like an agricultural expert or a legal expert,” Jarvis said.
Models trained to be experts could prove invaluable to customer service applications, with Jarvis such uses as providing businesses with “fairly proven value from generative AI so far.”
Grounded language models could automate certain customer service functions while acting as a support for human staff, Jarvis explained.
“The more complex the process is, the more you want the human involved, the more you want an assistant experience where the human and AI are working together. And the less complex it is, the more likely you are to automate it,” Jarvis said. “Getting the human to stay in the loop is not a cop-out with AI experiences, in a lot of use cases, it leads to a better experience for the user as well.”
Increasing Modalities: Reducing Costs
When ChatGPT came out, it handled simple text and code. Now, through updates like the GPT-4o model, it can handle images, text, code and more.
Jarvis said models like GPT-4o let businesses run inputs through a single API call, rather than separate calls for each modality – thereby reducing costs to run the model.
“This is making stuff a lot faster,” he said. “This is where a whole new raft of user experiences that depend on low latency interaction with modalities changing then become accessible with this change.”
OpenAI demoed interactive multimodal chatbots at its spring event and the company’s chief architect said they’re the next change in the meta for language models - more modalities under one language model.
“Are we eventually going to see a model where I can talk into it, and then it produces a video versus what I talked and actually, the modalities stop being a barrier, I just accept that I can interact with this API in the way I want,” Jarvis said.
Read more about:
AI Summit London 2024You May Also Like