Edge Processing Enables Generative AI for IoT DevicesEdge Processing Enables Generative AI for IoT Devices

Processing must be enabled on edge devices and engineered to be cost-effective to enable more ubiquitous generative AI

Amihai Kidron, Vice president of software engineering, Hailo

February 6, 2025

5 Min Read
A person's hands using a cellphone with a superimposed image of connected IoT devices
Getty Images

Today, generative AI is a novelty used only by early adopters, but tomorrow, it will be an inseparable part of our daily routines. Therefore, it must be accessible on various consumer devices, independent of cloud-based processing, and available to everyone, not just to those who can afford cloud AI subscriptions or high-end computers, smartphones, and even cars.

In other words, for example, if automobile makers mean to include generative AI in their vehicles (and many do), in-car processors must be able to handle it even when the car can't connect to the internet. As a great democratizing technology, generative AI should be available in a wide variety of car models, not just luxury cars.

There are two main engineering pieces to the always-available, consumer generative AI puzzle: Enabling generative AI at the edge and architecting it in a way that's not costly.

Let's break them down.

Generative AI at the Edge

Since the introduction of ChatGPT, software developers have been turning out applications and services that take advantage of large language models (LLMs) so consumers can create AI-generated content. Many Microsoft Windows users, for example, logged on one day to find Copilot in their Taskbar. But when they switched their laptop to airplane mode or worked on a presentation in a coffee shop with spotty Wi-Fi, they got the message: "You're offline."

Related:FCC's Starlink Approval Paves Way for Closing Rural Connectivity Gaps

That is, of course, because the LLM and AI processor farm needed to interact with Copilot resides in a data center, not on the user's laptop. Going forward, manufacturers of computers and smartphones are seeking to make generative AI a feature of the devices themselves, not a capability that exists solely in the cloud.

There are several reasons generative AI should be available on the edge, in addition to ensuring consumers can take advantage of it when their devices are offline. The first is application performance. The current paradigm of generative AI services reaching back to the cloud necessarily introduces latency. For an app like an AI-powered language translator or computer vision processor making sense of what the user's camera sees, the AI processing is best handled on the device to ensure real-time performance.

The second is user privacy. AI apps that can summarize a video call for the user, generate email responses, edit unwanted objects out of photos, or diagnose potential health conditions may best be handled locally so the user's data doesn't have to move to the cloud.

Then there's the issue of infrastructure demands. As the number of generative AI users grows, so does cloud processing. The data center processors used for generative AI are in such demand that one executive once told Congress that the fewer people used his company's AI tools, the better. Processing generative AI on edge devices can load-balance growing workloads, allowing applications to scale more stably and sustainably and relieve cloud data centers of costly processing.

Related:Will The 'U.S. Cyber Trust Mark' Transform IoT Security?

Plus, by balancing more generative AI processing on the edge, we reduce the need for cloud-based subscriptions to access applications, thereby lowering the cost to consumers and enabling more ubiquitous, generally available generative AI services.

The Need for Edge AI Processing

To achieve generative AI processing on edge devices, developers need to create LLMs that can run on a laptop, smartphone, or other edge device, and they need an edge AI processor designed for the task.

The first is accomplished through leaner data models. A cloud-based model of 60 billion parameters can't reasonably run on edge devices. The industry is starting to see more 4 billion-parameter models fine-tuned to specific generative AI tasks: Translation services, computer vision, interactive user manuals in cars, etc. A specific AI app would have its own, smaller LLM that's updated the way apps are updated today so they remain lean and current.

However, beyond developing edge-targeted AI models, consumers also need devices equipped with dedicated generative AI processors, such as neural processing units (NPUs), designed to handle AI's unique demands. Unlike GPUs, often repurposed for AI in high-end systems, NPUs provide a more efficient, cost-effective solution for edge devices. To fully unlock the potential of generative AI and integrate it seamlessly into everyday life, these processors must offer the right performance at the right price across various form factors. This also requires a new power-efficient architecture that's powerful enough to run generative AI tasks without overtaxing the battery.

Architected for Generative AI at the Edge

Dedicated edge AI processors are already coming to market. Some of the new generations of leading processor manufacturers include high-performance processors that function both as CPUs and NPUs (neural processing units). Most of these are expensive and tax devices' memory bandwidth, making them most applicable at the high end.

An alternative to costly CPUs and NPUs is dedicated generative AI accelerators, which are architected from the ground up to scale performance and power consumption to fit the device and application. One example is Hailo's Hailo-10H generative AI accelerator, which is capable of up to 40 tera-operations per second (TOPS) and typically consumes less than 3.5W. It will be available in multiple form factors with tightly integrated memory to meet different performance levels and price points.

Hailo uses a scalable, distributed data flow architecture. As a true neural processing unit (NPU), it's optimized for generative AI processing, enabling edge devices to run LLMs efficiently and effectively. It can also run on PCs, smartphones, cars, home security systems and more at a lower cost than alternative solutions.

In the near future, generative AI will be a basic function available on as many devices as possible, accessible whenever people need it, without the added costs of premium hardware or generative AI subscription services.

This article was first published in IoT World Today's sister publication AI Business.

About the Author

Amihai Kidron

Vice president of software engineering, Hailo, Hailo

Amihai Kidron has been the vice president of software engineering at Hailo for the past seven years and is now leading the AI accelerator product development. Prior to joining Hailo, he was at the Intel Corporation and Texas Instruments where he worked on various engineering teams. He earned a degree in Math and Computer Science from the Ben-Gurion University of the Negev.

Sign Up for the Newsletter
The most up-to-date news and insights into the latest emerging technologies ... delivered right to your inbox!

You May Also Like