Monday, November 20, 2023
About 10 years ago, when smartphones first started to use their on-board sensors like the camera and the microphone to interact with the real world around them, I opined that we as an industry were starting to make the transition from smartphones to ‘genius-phones’. Looking back now, it’s clear that those were really just the initial steps on that evolutionary path. Qualcomm introduced the next steps with the launch of their Snapdragon 8 Gen 3 mobile SoC on the first day of the 2023 Snapdragon Summit last week.
This latest mobile SoC is purpose-built for on-device generative AI. Qualcomm previously demonstrated an approximately 1 billion parameter Stable Diffusion text-to-image generative AI (genAI) model running on its earlier generation mobile SoC. This latest iteration supports genAI models with over 10 billion parameters on phones and over 13 billion parameters on PCs. Along with model size, one other metric of note is how quickly the device can return outputs from those models. The Snapdragon 8 Gen 3 delivers 15-30 tokens per second depending on the model and generates images in less than one second.
The aforementioned capabilities are enabled by leveraging generational improvements in Qualcomm’s latest AI Engine and AI Stack.
The AI Engine in the Snapdragon 8 Gen 3 consists of the latest Adreno GPU, Kryo CPU, Hexagon neural processing unit (NPU), Sensing Hub and supporting memory architecture, which according to Qualcomm delivers up to 98% faster performance while also improving power efficiency by 40% over the previous generation.
The GPU has been upgraded to not only enable real-time hardware-accelerated ray tracing but now provides global illumination support, as well as various gaming and hardware-accelerated image and video encoding/decoding enhancements.
Based on a 64-bit Arm Cortex-X4 architecture, Qualcomm upgraded the CPU to have five performance cores and two efficiency cores instead of the previous 4+3 configuration. In conjunction with one primary core, which can be clocked up to 3.3 GHz, the performance cores are capable of running at up to 3.2 GHz, while the efficiency cores support a clock rate of up to 2.3 GHz.
The NPU consists of scalar, vector, and tensor accelerators and has been upgraded with an enhanced power delivery system and micro tile inferencing to help with performance versus power optimization. Micro tile inferencing is the technique Qualcomm uses in its Hexagon NPU’s scalar accelerator. Ignacio Contreras, Qualcomm’s senior director of product marketing, explained that with micro tile inferencing, they can “slice neural network layers up in even smaller micro tiles to speed up the inferencing process of deep and complex neural networks and achieve even better power savings.” Additionally, to enable multiple model modalities, as well as optimizing model size versus accuracy and latency, the NPU supports INT4, INT8, INT16 and FP16 data types.
Last but certainly not least, given the critical role the on-board sensors like the cameras and microphones play in personalizing the on-device generative AI experience, the Sensing Hub has also been updated to yield up to 3.5× the AI performance compared to the previous generation with two micro NPUs, 30% more memory and two always-sensing image sensor processors (ISPs). The Snapdragon 8 Gen 3 is also equipped with a 12-layer cognitive ISP.
Upgraded intelligence
Ultimately, it’s not just about performance or increased intelligence. It’s about what you can really do with it. Support for increased model parameter sizes, the Sensing Hub and ISP improvements, and upgraded memory architecture allow the SoC to support one of the most critical features for ensuring maximum AI capability and the most natural way of interacting with those capabilities: multi-modal model support. Just as humans interact with each other and the world around them through a combination of speech, sight, feel and hearing, interactions with on-device AI should also support voice/audio, text, image, and physical sensor sampling, such as with infrared sensors and video. Multi-modal model capability allows a device to ingest all these types of input prompts, as well as output different types of content from the written or spoken word to pictures and video clips.
For example, increased performance and multi-modal model support enables on-device photo expansion. This feature comes in handy when the user wants to resize an image without distorting it or reducing the image resolution. Take, for instance, a user who shot a picture in portrait mode for use in a social media post. Using the same image as a banner advertisement now requires nothing more than a finger tap or even a speech prompt to direct the device to expand the image with a new aspect ratio and fill the empty expanded space with new content that seamlessly matches the existing background through the use of generative AI.
Another example is visual object removal. While this feature—which allows users to remove unwanted objects—had been available in previous Snapdragon 8 generations for pictures, Snapdragon 8 Gen 3 can now remove objects from video.
The digital versions of speech, sight and hearing are relatively straightforward. However, how can a device use ‘feel’ as an input? The answer is through the use of different sensors, such as time-of-flight and infrared sensors. One example Qualcomm demonstrated at the recent Snapdragon Summit was the use of a time-of-flight sensor to sample the number of particles present in the air and, using AI, determine the air quality to see if it’s safe to exercise outside. Another demonstration used infrared sensors with an AI model running on a phone that determined an individual’s hydration level with a touch of their skin or determined a cookie’s freshness after it had been left out.
Next steps
The first wave of devices based on the new Snapdragon 8 gen 3 SoC will appear this quarter from smartphone OEMs. However, this is just the beginning of the transition from smartphones to genius-phones. Ongoing research and development, as well as the competitive landscape, will deliver even more performance, efficiency and ultimately, AI capability over the next few years.
As the industry and use cases mature and become more sophisticated, the AI experience will also be refined. We will carry personalized AI models, that we have fine-tuned by using them, from device to device. It’s not that difficult to imagine a situation where the AI models on our devices will learn about our preferences and interests and we won’t want to start from scratch when we move between phone, laptop and car.
As we continue down this path, it’s clear that the next generation of genius, AI-powered phones is an evolutionary step towards a whole new world of use cases and experiences. It will be exciting to see by this time next year how big the next step will be.
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|