Home
News
Products
Corporate
Contact
 
Friday, August 22, 2025

News
Industry News
Publications
CST News
Help/Support
Software
Tester FAQs
Industry News

Cadence, Nvidia team up for power profiling in AI chips


Thursday, August 21, 2025

The massive complexity and computational needs of today’s most advanced AI chips pose a big challenge for designers: They’re often unable to accurately predict their power consumption under realistic conditions.

To address that, Cadence introduced a Dynamic Power Analysis (DPA) tool that enables more precise power analysis of high-performance chips, particularly GPUs and other big AI chips, before the physical silicon is ready. Cadence said the DPA app, developed through a close collaboration with NVIDIA, runs on top of its Palladium Z3 hardware emulation platform to evaluate the dynamic power consumption of chip designs with billions of gates. The DPA can handle billions of cycles in hours instead of days with up to 97% accuracy.

The EDA giant said the innovation allows chip designers and system engineers to address power issues before fabricating the final chip, resulting in more efficient hardware while accelerating time-to-market.

NVIDIA VP of Hardware Engineering Narendra Konda said, “By combining NVIDIA’s accelerated computing expertise with Cadence’s EDA leadership, we’re advancing hardware-accelerated power profiling to enable more precise efficiency in accelerated computing platforms.”

Why Predicting Power Behavior is Getting Harder for AI Chips

The collaboration comes as the latest AI training chips become increasingly power-hungry. These chips can contain 100 billion or more transistors based on advanced nodes like 4 nm and 3 nm. While these transistors each sacrifice a relatively small amount of current during operation, the power quickly adds up. And although the supply voltage of the transistors is as low as 0.7 V, the chips can burn through more than 1,000 A or even 1,500 A of current, putting continuous power (also called the thermal design power, or TDP) at over 1000 W.

High-performance GPUs such as NVIDIA’s Blackwell B100 and B200 consume as much as 1,200 W to take care of computationally intense jobs such as AI training and, to a lesser extent, inferencing. Things are trending up with NVIDIA's next-generation Rubin GPU, which is estimated to consume 700 W per 3-nm accelerator die. The multi-die Rubin GPU could use 1,800 W when surrounded by high-bandwidth memory (HBM). The Rubin Ultra, which will double the number of chiplets, could have a TDP of 3,600 W.

Cadence is primarily known for its electronic design automation (EDA) tools, which are employed by companies such as AMD, Apple, Intel, and NVIDIA to aid in the development of next-gen chips. But it’s also a hardware vendor, supplying platforms such as Palladium that can emulate these designs and enable engineers to see how they perform — and how much power they draw — while running real workloads. By enabling pre-silicon verification and validation, hardware emulation helps companies deliver products to market at a faster pace, said Cadence.

According to NVIDIA, it used these tools for pre-silicon verification and validation of its Blackwell family of AI chips, and the company is utilizing the latest generation—the Palladium Z3—for some of its future processor designs like Rubin.

“In the context of pre-silicon verification and validation, having timely and accurate power estimation for actual workloads enables engineers to re-design or re-optimize hardware/software to strike the best balance of power and performance,” said Michael Young, director of product marketing at Cadence.

But accurately projecting the power consumption of AI chips such as the Blackwell is becoming more difficult. Technically, this is due to the challenges of modeling power as smaller transistors are packed more closely.

These ultra-dense chip designs make it inherently harder to predict power under dynamic conditions. In addition, AI workloads have highly variable power characteristics, straining different parts of the chip at different times. Given that, it’s important to emulate the entire chip over as many cycles as possible.

There are also practical reasons. Under tightening deadlines, engineers rarely have time to perform full-chip dynamic power analysis across billions of cycles. That frequently leads to discrepancies between pre-silicon and post-silicon power dissipation.

Dynamic Power Analysis: Closing the Pre- Versus Post-Silicon Gap

Traditional power-analysis tools struggle to scale up to more than several hundred thousand cycles without requiring impractical amounts of time, according to Cadence. “In the past, users could do small slices of a power estimation window (in hundreds or thousands of cycles), which limited the power profiling view and, hence, the accuracy of the real power used to process a particular workload,” noted Young.

The company worked closely with NVIDIA to overcome these challenges, using hardware-assisted power acceleration and parallel processing innovations to enable more fine-grained power analysis on Palladium. “We now have the ability to measure power estimation at real workloads at length [with] billions of cycles, instead of a few thousand cycles, which was the practical limitation in the past due to the time it takes to process,” said Young.

He added that the DPA can more accurately estimate power consumption under real-world workloads allowing the power and performance to be verified before being locked into silicon, when the chip design can still be optimized. Particularly useful for power-hungry GPUs and other AI accelerators, the early power modeling helps to improve efficiency while preventing delays that come from over-designing or under-designing them.

“In the context of pre-silicon validation, engineers can gain very accurate power data,” stated Young. “This allows engineers to re-think what could be optimized while silicon design is still in the pre-silicon stage.”

It’s also integrated into the Cadence analysis and implementation solution so that designers can do power estimation, reduction, and signoff throughout the design process, giving you more energy-efficient and easier-to-cool silicon.

As data centers blow through even more electricity, accurately predicting chip-level power consumption is becoming a bigger deal. While power densities at the server level are pushing to more than 3 kW, rack-level requirements are also scaling rapidly — from 30 to 40 kW today to more than 120 kW for a single rack equipped with NVIDIA’s Grace Blackwell superchips. Underestimating power doesn’t just risk inefficiency, it could also lead to overloading the power supply or cause thermal failures in other parts.

“With the advanced AI/ML type of designs, the need to fully emulate the entire silicon design with realistic ML workloads is at the heart of system validation,” said Young. “Palladium DPA adds the extra power dimension to the functional validation.” By evaluating hardware and software at the same time and factoring them both into the power profile, Palladium gives engineers “a more accurate model of silicon validation.”

By: DocMemory
Copyright © 2023 CST, Inc. All Rights Reserved

CST Inc. Memory Tester DDR Tester
Copyright © 1994 - 2023 CST, Inc. All Rights Reserved