Thursday, August 29, 2024
IBM has unveiled a more powerful processor for its famed mainframe systems, promising enhanced on-chip AI acceleration for inferencing plus integrated data processing unit (DPU) to boost IO handling.
There is also a separate AI accelerator intended to support inferencing at greater scale.
Announced at the Hot Chips 2024 conference in Palo Alto, the Telum II processor is expected to bring significant performance improvements to the mainframe, according to Big Blue. The company also gave a preview of the Spyre AI Accelerator, and said it expects both chips to be available with next-generation IBM Z systems coming in the first half of 2025.
If IBM can be believed, roughly 70 percent of the entire world's transactions by value run through its mainframes, and it said the developments it is showcasing at Hot Chips will enable it to bring generative AI to these mission-critical workloads.
Telum II will be an eight-core chip like its predecessor, but in the new silicon, these run at a higher 5.5GHz clock speed. There are ten 36 MB Level-2 caches; one for each core, one for the DPU, and the tenth as overall chip cache. With the virtual L3 and virtual L4 growing to 360 MB and 2.88 GB respectively, this represents a 40 percent increase in cache size, IBM said.
The first Telum processor brought built-in AI inferencing to the z16 when it was launched in 2022. It is capable of running real-time fraud detection checks against financial transactions while they are being processed.
Big Blue says it has significantly enhanced the AI accelerator features on the Telum II processor, reaching 24 trillion operations per second (TOPS). But, as The Register has explained before, TOPS can be a misleading metric. Support for INT8 as a data type has been added, but the Telum II itself is engineered to enable model runtimes to operate side by side with the most demanding enterprise workloads.
The on-chip DPU has been added to help meet the ever-increasing demands of workloads, particularly with an eye to future AI workloads and the coming Spyre Accelerator for the Z systems.
According to the Armonk outfit, each DPU includes four processing clusters, each with eight programmable microcontrollers and an IO accelerator that manages those processing clusters plus the IO subsystem for two IO drawer domains. The DPU also features a separate L1 cache and a request manager to track outstanding requests.
The DPU sits between the main processor fabric and the PCIe fabric. The aim of directly attaching it to the fabric like this is to greatly reduce the overhead for data transfers while improving throughput and power efficiency.
IBM said that as a maximum configuration, future Z systems might have up to 32 Telum II processors and 12 IO cages, where each cage has up to 16 PCIe slots, allowing the system to support a total of up to 192 PCIe cards, greatly expanding IO capacity.
The Spyre Accelerator will contain 32 cores with a similar architecture to the AI accelerator integrated into the Telum II chip itself. An IBM Z could be configured with multiple Spyre Accelerators to be fitted via PCIe in order to scale AI acceleration as required. A cluster of eight cards would add 256 accelerator cores to a single IBM Z system, for example.
Both Telum II and the Spyre Accelerator are designed to support what IBM refers to as ensemble AI, which it describes as using multiple AI models to improve the performance and accuracy of predictions compared with individual models.
"The Telum II Processor and Spyre Accelerator are designed to deliver high-performance, secured, and more power efficient enterprise computing solutions," said Tina Tarquinio, Big Blue's VP of Product Management for IBM Z and LinuxONE, in a supplied remark.
"After years in development, these innovations will be introduced in our next-generation IBM Z platform so clients can leverage LLMs and generative AI at scale," she added.
Big Blue is looking to move beyond inferencing to perform fine-tuning and even potentially training of models on its mainframes as well. This would allow clients such as banks and other businesses that wish to keep data securely held on their own premises, to train and deploy models entirely within their organization, it said.
Both the Telum II and the Spyre Accelerator will be manufactured for IBM by Samsung using a 5 nm process node. ®
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|