Friday, January 12, 2024
A new system-on-chip (SoC) demonstrated at CES 2024 in Las Vegas claims to run multi-modal large language models (LLMs) at a fraction of the power-per-inference of leading GPU solutions. Ambarella is targeting this SoC to bring generative AI to edge endpoint devices and on-premise hardware in video security analysis, robotics, and a multitude of industrial applications.
According to the Santa Clara, California-based chip developer, its N1 series SoCs are up to 3x more power-efficient per generated token than GPUs and standalone AI accelerators. Ambarella will initially offer optimized generative AI processing capabilities on its mid to high-end SoCs for on-device performance under 5W. It’ll also release a server-grade SoC under 50 W in its N1 series.
Ambarella claims that its SoC architecture is natively suited to process video and AI simultaneously at very low power. So, unlike a standalone AI accelerator, they carry out highly efficient processing of multi-modal LLMs while still performing all system functions. Examples of the on-device LLM and multi-modal processing enabled by these SoCs include smart contextual searches of security footage, robots that can be controlled with natural language commands, and different AI helpers that can perform anything from code generation to text and image generation.
Les Kohn, CTO and co-founder of Ambarella, says that generative AI networks are enabling new functions across applications that were just not possible before. “All edge devices are about to get a lot smarter with chips enabling multi-modal LLM processing in a very attractive power/price envelope.”
Alexander Harrowell, principal analyst for advanced computing at Omdia, agrees with the above notion and sees virtually every edge application getting enhanced by generative AI in the next 18 months. “When moving generative AI workloads to the edge, the game becomes all about performance per watt and integration with the rest of the edge ecosystem, not just raw throughput,” he added.
The AI chips are supported by the company’s Cooper Developer Platform, where Ambarella has pre-ported and optimized popular LLMs. That includes Llama-2 as well as the Large Language and Video Assistant (LLava) model running on N1 SoCs for multi-modal vision analysis of up to 32 camera sources. These pre-trained and fine-tuned models will be available for chip developers to download from the Cooper Model Garden.
Ambarella also claims that its N1 SoCs are highly suitable for application-specific LLMs, which are typically fine-tuned on the edge for each scenario. That’s unlike the classical server approach of using bigger and more power-hungry LLMs to cater to every use case.
With these features, Ambarella is confident that its chips can help OEMs quickly deploy generative AI into any power-sensitive application ranging from an on-premise AI box to a delivery robot. The company will demonstrate its SoC solutions for AI applications at CES in Las Vegas on 9-12 January 2024.
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|