Thursday, October 20, 2011
On Wednesday, ARM formally unveiled its next-generation smartphone processor, the Cortex A7, codenamed ¡°Kingfisher.¡± But there was much more to the A7¡äs launch than just the unveiling of a new processor architecture for smartphones. The chip company also announced plans to pair the A7 with the much larger and more powerful Cortex A15 in phones and tablets, using a technique called heterogeneous multiprocessing (or ¡°big.LITTLE¡±, as ARM prefers to call it) to dynamically move lighter workloads from the larger, more power-hungry A15 to the leaner A7 in order to extend mobile battery life.
When used in a dual-core configuration, the A7 will bring the performance characteristics of what is currently a $500 phone to the $100 ¡°feature phones¡± of 2013. These future feature phones will have the same capabilities as today¡¯s high-end smartphones, but they¡¯ll have the low prices and long battery life that the feature phone market demands. For the high-end ¡°superphones¡± and tablets of 2013, the A7 will be paired with the much larger and more powerful A15 core to yield a processor that sips power like a feature phone when all you¡¯re doing is some light web surfing, but can crank up the juice when you¡¯re gaming.
ARM claims that the A7 will double the performance of its existing Cortex A8 family through a combination of process shrinks and improvements at the level of microarchitecture. Or, as ARM processor division chief Mike Inglis put it at the launch event, ¡°Outpacing Moore¡¯s Law with microarchitectural innovation is what we¡¯ve been working on with A7 as a product.¡± Though Inglis never mentioned this specifically, I learned that we can actually thank Google¡¯s ¡°open¡± smartphone OS, Android, for some of that innovation.
The A7¡äs design improvements over the older A8 core are possible because ARM has had the past three years to carefully study how the Android OS uses existing ARM chips in the course of normal usage. Peter Greenhalgh, the chip architect behind the A7¡äs design, told me that his team did detailed profiling in order to learn exactly how different apps and parts of the Android OS stress the CPU, with the result that the team could design the A7 to fit the needs and characteristics of real-world smartphones. So in a sense, the A7 is the first CPU that¡¯s quite literally tailor-made for Android, although those same microarchitectural optimizations will benefit for any other smartphone OS that uses the design.
The high-level block diagram for the A7 released at the event reveals an in-order design with an 8-stage integer pipeline. At the front of the pipeline, ARM has added three predecode stages, so that the instructions in the L1 are appropriately marked up before they go into the decode phase. Greenhalgh told me that A7 has extremely hefty branch prediction resources for a design this lean, so I¡¯m guessing that the predecode phase involves tagging the branches and doing other work to cut down on mispredicts.
(Note that branch prediction is one of the best places to spend transistor resources where you get not only greatly improved performance but also improved power efficiency. The power of branch prediction for boosting performance/watt was one of the major revelations that Intel¡¯s Banias (Pentium M) team first brought to the Intel product line. So it makes sense that the A7 has gone all-out here.)
After the decode phase, two instructions per cycle can issue through one of five issue ports to the machine¡¯s execution core. This execution core consists of an asymmetric integer arithmetic-logic unit (ALU), where one pipe is a full ALU and the other is limited to simpler operations. There¡¯s also a multiply pipe for complex integer operations, a floating-point NEON pipe for floating-point and SIMD ops, and a Load/Store pipe for memory ops.
The feature set for the A7 is identical to that of the Cortex A15¡ªthis is critical, because when A7 is paired with A15 in a big.LITTLE configuration the two cores have to be identical from a software perspective.
As important as the launch of a new core design is, ARM¡¯s heterogenous multiprocessing plans are perhaps the biggest news to come out of Wednesday¡¯s event. big.LITTLE links a dual-core A15 and a dual-core A7 with a cache-coherent interconnect, and it covers the pair with a layer of open-source firmware that dynamically moves tasks among the cores depending on those tasks¡¯ performance and power needs.
The OS doesn¡¯t actually need to be modified or to be at all aware of the smaller A7 cores in order to take advantage of the technology. All popular mobile and desktop OSes now ship with dynamic voltage and frequency scaling (DVFS) capabilities, so that they can tell the CPU when they need more horsepower and when they need less. For lighter workloads, a typical CPU responds to the OS¡¯s signal by throttling back its operating frequency and lowering its, thereby saving power; for heavier workloads, it can burst the frequency and voltage higher temporarily to provide a performance boost. The open-source firmware layer that will sit between the OS and a big.LITTLE chip can take these standard signals and, instead of downclocking the A15 when the OS asks for less horsepower, it simply moves the workload onto the A7 cores. So while it will be possible to modify an OS to be big.LITTLE-aware, but it¡¯s not necessary in order to take advantage of the capability.
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|