Wednesday, April 24, 2024
High-bandwidth memory (HBM) has become the artificial-intelligence memory of choice, and as HBM3 goes into volume production, more attention is being paid to power consumption in the wake of 2023’s generative AI boom.
Lou Ternullo, senior director of product marketing for silicon IP at Rambus, told EE Times in an exclusive interview that the increasing demand for memory bandwidth from AI is directly correlated to increasing HBM bandwidth. “Across the market, we have seen datasets and training models getting larger and larger, and 2023’s GenAI boom only accelerated this,” he said.
Performance needs, memory bandwidth and memory sizes are growing exponentially, Ternullo said, putting higher expectations and pressure on the next generation of HBM.
While bandwidth per watt as it relates to HBM is not particularly new, he said, energy consumption by data centers has been on the rise. “The huge investments and deployments into generative AI in 2023 have some predicting that data center electricity use will double by 2026.”
Ternullo said these rapidly increasing power costs mean that bandwidth per watt is becoming a more important metric for enterprises who need to monitor operational costs—even more so with the increasing focus on sustainability initiatives.
The high costs associated with HBM and the price tag of the memory itself means the total cost of ownership becomes the deciding factor when determining if this uber-power memory is necessary for application. Ternullo said the process for customers to decide which memory they need starts with technical requirements like density, performance and power.
AI performance demands have no ceiling
AI/machine-learning training is one of the very few applications that can monetize the value of the more expensive HBM compared with other memories, Ternullo said. “Some applications like AI have an insatiable thirst for memory bandwidth and offer a higher ROI for the business, which justifies the higher cost of HBM.”
It’s not exactly a direct line. AI is driving the use of GPUs, which often require HBM to meet system performance expectations. Jim Handy, principal analyst with Objective Analysis, said in an exclusive interview with EE Times that HBM requires a clear cost justification, noting that for some, graphics applications companies like AMD will use GDDR with some GPUs because it’s cheaper.
Outside of AI, Handy said, GPUs are mainly used for graphics, especially for gaming and computer animation—post-production effects, such as explosions where there were no explosions. “Those companies all use GPUs, and significant numbers of them,” he said. “They’ll have a big data center that’s just chock full of GPUs.”
While GDDR was originally designed for graphics work, emerging applications over the years have seen competitive demand for the memory technology for other applications.
Similarly, expensive HBM is hard to come by right now given all the AI activity, Graham Allan, senior staff product manager at Synopsys, told EE Times in an exclusive interview. While there remain fringe applications for HBM, the bulk of the opportunities are in the AI space, he said.
Even as the third iteration of HBM goes into high-volume production, Allan wouldn’t describe the technology as mature. “HBM is unique from the DRAM side in that it’s the only DRAM that doesn’t go on the motherboard beside your processor.” Instead, it goes inside the package on an interposer—the 2.5D element of HBM—which creates challenges because it requires an extra technological step, he said. “People are not comfortable using it.”
HBM requires integration support
Allan said implementing DRAM is straightforward: If you want to design an SoC that has a DDR5 interface, you can go and you can look at any kind of reference design that’s in the public domain, find the DDR5 DIMMs approved by Intel and get all the part numbers, he said. “This is a mature technology.”
But with HBM, everything is inside the SoC package, including DRAM, which can be selected from a variety of vendors, such as Micron, Samsung and SK Hynix, Allan said, while decisions must be made as to how to design the interposer and address various elements, including signal path and integrity.
Synopsys offers the enabling IP needed for customers to implement HBM, including controllers, PHY and verification IP. Allan said customers are looking for HBM expertise and assistance with specific reference designs. “We share reference designs and some of the most common interposer-type technologies.” Synopsys also aids testing in silicon, including the interposer and its connection, he said. “We could go down the rabbit hole of doing a completely custom test chip for a customer.”
Testing is especially important with HBM because once you commit to design and p
into a system, it’s very time-consuming to change it, Allan said.
He said HBM is maturing, but it’s still nowhere near as mature as DDR and LPDDR technologies—it’s a big leap to move from DDR4 to DDR5, although HBM4 is a similar logical approach to HBM3. Allan said opting to use HBM is a big commitment, and customers are looking to de-risk the decision as much as possible. “It is more complicated and it’s a lower-volume product. There are some pitfalls that you can discover if you’re going out on your own.”
Allan said that customers are opting for HBM because nothing else will satisfy their requirements. Below HBM, GDDR memory might be sufficient for some applications, with GDDR7 doubling capacity and increasing the data rate of GDDR6. He said the data rates are high because of the small bus. “You can get to higher data rates, but you have to be very careful about how you design your system because you’re operating at very high speed.”
GDDR7 is a 2026 technology, however, and HBM3 that came out last year has 3× higher bandwidth potential, Allan said. “The runway for bandwidth is just off the chart.”
That doesn’t mean it’s enough for bandwidth-hungry AI, he said, but there are other factors that come into play as to how much the entire system can accomplish. The interposer can potentially become a bottleneck, for example. “If you have a poorly routed PCB and you’re getting too much crosstalk, that could end up being the thing that throttles you,” Allan said.
JEDEC is currently working on HBM4 specification, but the technology standards association did not want to discuss timelines or progress to date. During a keynote at Semicon Korea 2024, SK Hynix vice president Kim Chun-hwan did reveal that the company plans to initiate mass production of HBM4 by 2026.
Micron Technology recently began volume production of its HBM3E memory, and it’s already sold out for 2024. The company’s first HBM3E product is an 8-high 24-GB stack with a 1,024-bit interface, 9.2-GT/s data-transfer rate and a total bandwidth of 1.2 TB/s.
Data centers are more power-conscious
When HBM first came to the market, Micron reviewed the applicable workloads and decided to aim for 30% better performance than what the industry appeared to require, Girish Cherussery, senior director of product management at Micron, told EE Times in an exclusive interview. “We were future-proofing ourselves.”
Cherussery said a critical metric is also the performance per watt—there is a power boundary condition. “We focused on making sure that the performance per watt is significantly better.” Customers also want the HBM close to compute, he said.
Cherussery said many AI workloads, including large language models, are becoming more and more memory-bound rather than compute-bound—if you have enough compute, memory bandwidth and capacity becomes the constraint. AI workloads are putting a lot of strain on data centers, he said.
High memory utilization means memory power is the power that’s being burned in the data center, Cherussery said, so a 5-W savings in power can add up—increasingly, data centers are being defined by their wattage rather than how many racks. Cooling is also a significant factor when using HBM because it is a stacked memory. “This heat needs to be dissipated,” Cherussery said.
Beyond bandwidth, power and overall thermal profile, he said, ease of integration is the most critical characteristic of any HBM. Micron has its own proprietary approach for integrating its HBM into host systems.
He said the industry is ready for HBM3E, which can be easily slotted into a system that uses HBM. “Our product seamlessly fits into that same socket with no changes needed. It’s the same footprint as the previous generation.”
Cherussery said higher bandwidth and increased capacity will characterize HBM4. “What we are seeing is, as the AI models grow, there is a linear scaling on capacity and bandwidth requirements.”
The memory industry overall is an interesting phase because it’s never been in a spot where a certain workload—generative AI and AI in general—linearly scales with memory bandwidth and memory capacity, Cherussery said. “It’s going to mean that compute and memory will have to start thinking about the systems slightly differently from what they have been looking at in the past. The data center itself is becoming more and more heterogeneous.”
Samsung Electronics is also witnessing a notable expansion of heterogeneous computing and more AI-focused services in data centers. Indong Kim, the company’s vice president of product planning and business enabling. “This growth seems to correspond with the rise of hyperscalers that offer both direct and indirect AI solutions,” he said.
Data centers are evolving to use computing resources to their maximum potential for specific workloads, including AI, with an emphasis on DRAM bandwidth and capacity, Kim said. What’s especially exciting, he said, is that these heterogeneous architectures employing two dissimilar types of processors—CPUs and accelerators—are seeking the same goal when it comes to memory. “We’re confident that this trend will offer tremendous growth opportunities for DRAM manufacturers.”
At Memcon 2024, Samsung demonstrated what the company said is the world’s first 12-stack HBM3E DRAM, which uses Samsung’s advanced thermal-compression non-conductive film (TC NCF) technology to enhance vertical density of the chip by more than 20% compared with its predecessor while also improving product yield.
As massive parallel computing is becoming more accessible in high-performance computing (HPC) environments, Kim said there has been a surge in HBM demand.
HBM intersects with CXL
Samsung’s HBM3E DRAM is designed to meet the needs of HPC and demanding AI applications. The company also used Memcon as an opportunity to introduce its Compute Express Link (CXL) Memory Module—Box (CMM-B), designed to support applications that need high-capacity memory, such as AI, in-memory databases and data analytics. The CMM-B also supports memory pooling, which is a critical element of heterogeneous computing.
Kim said ever-increasing capacity and bandwidth requirements for AI and ongoing expansion of model sizes will benefit from the multiple tiers that the memory industry offers, which is where the rapidly expanding CXL protocol will intersect with HBM.
“We believe CXL will be a perfect complement for ever-increasing capacity needs, providing optimal characteristics that bridge the existing DRAM-SSD hierarchy,” he said.
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|