Home
News
Products
Corporate
Contact
 
Thursday, February 5, 2026

News
Industry News
Publications
CST News
Help/Support
Software
Tester FAQs
Industry News

Why Move To 2nm?


Thursday, February 5, 2026

The rollout of 2nm process nodes and beyond will require new approaches for managing power and heat. But it also will enable greater flexibility in designs and more options for improving performance and optimizing costs.

Power, performance, and area/cost are still the key metrics for chipmakers, yet how those metrics are weighted and implemented can vary significantly. In the past, the chip market was split between extremely low-power chips used in smartphones and other mobile devices, and those targeted for servers and powerful plug-in workstations. But with the spread of AI across nearly everything electronic, applications are becoming much more granular and targeted. Which processing elements work best for different data types or workloads may be very different from one chipmaker or systems vendor to the next. And what works best in one region may not be an option in another due to power grid limitations, uneven and often unpredictable availability of essential components or materials, as well as geopolitical regulations.

Disaggregation into multi-die assemblies enables different processors and functions to be prioritized, while simplifying contingency plans in case of less-critical component shortages. And rather than cramming every component onto a reticle-sized SoC at the most advanced node, different dies can be developed at whatever node makes sense.

Shrinking features is still important for some logic, but what gets scaled to the most advanced nodes is becoming a progressively smaller fraction of the overall design. The flip side is that more transistors in the form of chiplets can be added to boost performance, so long as data movement in and out of processors and memories is fast enough to handle the exploding volume of AI data.

“Honing in specifically on 2nm, there will likely only be a few parts of these complex integrated systems that are at the most advanced technology node,” said David Fried, corporate vice president at Lam Research. “It ends up being an optimization. You want to use the optimal technology for each element of the system. We used to optimize power, performance, area, and cost for monolithic integration of all these things. What advanced packaging has done is enable us to optimize power, performance, area and cost for individual subsystems. The result of that is usually different technologies that come together through heterogeneous integration. Chiplets are a natural evolution of hierarchical system integration.”

This is a different approach to scaling. “Today, many applications are gaining optimization through de-integration,” Fried said. “It’s separating logic from memory, and separating I/O from logic, and separating the memory controller from memory. A lot of the products we’re seeing right now are optimizing through disaggregation or de-integration, moving toward a more complex advanced packaging flow. That’s how they’re optimizing PPAC.”

That has broad implications for the whole semiconductor supply chain. “We are going to bring more flexibility and customization,” said Rozalia Beica, field CTO for packaging technologies at Rapidus, which has licensed IBM’s 2nm process technology. “Some of the packages that we are working on with customers will have 2nm, and they will also have other technology that is not that advanced. And we definitely will have to partner with other companies in the industry because we are not going to make 4nm or 7nm chiplets. We only provide the 2nm chiplets, and will partner with other foundries, if possible, or with OSATs, to bring 2nm and other technologies to the package.”

This may sound straightforward enough. It’s easier to design and manufacture a chiplet than a full SoC. But integrating the various pieces isn’t easy.

“There is this notion of a hybrid design where you can mix and match different standard cells — mixing high-performance standard cells with low power standard cells, and maybe high-density ones,” said Abhijeet Chakraborty, vice president of engineering at Synopsys. “So you have more flavors of these standard cells available, and the EDA tools have to choose them judiciously to maximize your benefits. If you use high-performance standard cells everywhere, because you’re trying to meet very aggressive performance targets for an HPC AI design, then you’re going to pay a price in power and perhaps other metrics. But this mix is very important.”

Flexible options, customizable metrics That’s just the starting point. “It gets even more interesting,” Chakraborty said. “You could have a homogeneous system where you have all 2nm dies. They all have to connect together. So then you get into challenges or opportunities with advanced packaging, hybrid bonding, bond pitches, and things like that. How do you connect these dies together? There are a lot of advancements in inter-die connectivity, improving the interconnect density and pitches, and the signal integrity performance, as well. The other interesting element with multi-dies is you can mix-and-match, too. You can have a 28nm die mixed with a 2nm die. That’s a way of alleviating the challenges around cost and yield, and the barriers to used these advanced nodes.”

Initially, at least, this new breed of multi-die assemblies is being developed for large AI data centers and the upper end of the smartphone and PC markets. Putting the various pieces together and crunching the numbers — PPA/C, time to market, design and verification time, time in the fab or packaging house — involve some intensive design and verification, including multiple test chips and fine-tuning based on how and where the technology will be used.

“The performance and power benefits are real, but they are conditional,” said Evelyn Landman, CTO at proteanTecs. “Node transitions no longer deliver linear gains by default. The real value comes from how close the system can safely operate to the true physical limits of the silicon. This is already visible in large-scale AI platforms, where performance per watt is the dominant constraint rather than raw frequency. At 2nm, the economics depend entirely on intelligent guard-band management. Leave too much guard-band and the investment fails. Remove it blindly and reliability fails. The winners will be those who can measure, understand, and manage guard-bands, dynamically, continuously, across workloads and lifetime.”

This is an expensive and engineering-intensive process. But for AI data centers, being able to process more data faster by packing more transistors into a multi-die assembly using less power is a winning formula. And for high-end phones and PCs, one chip design can be amortized across huge volumes. So while it may cost $100 million or more to develop a new chip, that may be acceptable, particularly if there is a possibility of re-using many parts of a design when faster or lower-power logic, denser memories, and/or photonics interconnects become more widely available.

“Overall, what we’re seeing on 2nm nodes is a regular progression in terms of increased power density,” said Ben Sell, vice president and general manager of logic technology development at Intel. “When we design a technology, the metrics we focus on are power, performance, and area/cost. But this is not just all about performance. A lot of this is performance per watt and the amount of area scaling that you can get.”

Intel’s Panther Lake, introduced in January 2026, uses an 18 angstrom process. “It has an interposer and a bunch of chiplets on top of Panther Lake, and the compute chiplet is in 18A,” according to Sell. “We also have other products coming out next year that are more classical packaging — so not necessarily stacked, but in a multi-chip package. We are now working on the roadmaps beyond this to include 14A, as well. Panther Lake is a client product, but even there we have different chiplets with different needs. We have the compute tile that is performance-based, but we also have a lot of performance per watt or power-efficient metrics so that you get good battery life. We also have a graphics tile, which is a lot more focused on power reduction and power/performance tradeoffs. And then you have other chiplets that are the more classical kind of chipset applications, which do all the interfacing with the rest of your compute system. And then there are server products, which are extremely power-sensitive.”

Performance improvements vary per node and by foundry process, but the days of 30% improvements in both performance and power at each new node are long gone.

“From a design perspective, the expectation of a customer who wants to adopt 2nm, if they’re coming from 3nm to 2nm, is an average of 10% to 15% faster performance, and 20% to 30% lower power consumption — and, of course, 15% or so higher transistor density,” said Synopsys’ Chakraborty. “But then there’s also the challenge of whether you can attain those. The lower power is especially compelling for a lot of applications that care about performance per watt and higher transistor density. A lot of innovation and investment that Synopsys has done is to maximize what you can get from 2nm. But there are real-world challenges that lead to yield and manufacturing.”

Unlike in the past, yield for leading-edge dies is effectively no longer determined by final test. It still needs to be assembled into some type of advanced package, and it needs to function within spec in the field over time.

“At 2nm and 18A, the dominant challenge is no longer transistor scaling alone,” said proteanTecs’ Landman. “It is uncertainty management across the entire lifecycle of the silicon. As architectures move to nanosheets and new power delivery schemes, the margin for error collapses across device physics, manufacturing, packaging, and real workloads. Effects that were once second-order, such as local voltage droop, thermal gradients, aging, and workload-driven stress, are now amplified continuously and locally. This is already evident in the early ramp behavior, where variability must be understood not just statistically, but spatially and dynamically. Static assumptions and worst-case guard-bands are no longer sufficient because the most dangerous conditions are not fixed corners. They are transient, workload-dependent, and often invisible until the system is running. The industry is crossing an inflection point where correctness must be managed continuously, rather than assumed at sign-off.”

Endless tradeoffs To understand just how complex this can become, consider performance, which has a direct impact on heat. The higher the utilization of an AI server, the greater the need for higher-performance logic because it saves power. But operating at a higher frequency also generates more heat, which means the heat has to be dissipated somehow. If passive heat sinks aren’t sufficient, more active energy-intensive approaches are required.

With 2nm processes, more transistors can be crammed into a given space than at 3nm.[1] That means higher power density, which enables more processing to be done faster using the same amount of power. As a result, each new node can save power for a given workload. But if the utilization increases too much, the heat rises to the point where the die either requires more complex cooling — heat is more difficult to extract from inside a die with higher power, transistor, and thermal density — or performance throttling, which may negate the whole reason for moving to 2nm in the first place.

At each new node after 20nm (16/14nm for TSMC and Samsung), thermal issues became increasingly difficult to manage, resulting in a seemingly endless series of tradeoffs. While the introduction of finFETs reduced gate leakage, thermal density increased with more transistors. At 7nm, and each successive node after that, gate leakage once again became an issue, adding to the thermal issues stemming from increasing dynamic power density.

Gate leakage will be addressed once again with gate-all-around FETs at 2nm, and once again with complementary FETs at some future node and new materials such as molybdenum and even 2D materials in the future. But power density will continue to be a problem if logic utilization is too high. So how leading-edge logic is used may require some complex tradeoffs in a multi-die assembly, and in where data is physically processed or pre-processed within a system.

There are other factors that enter into this economic formula, as well, such as the time it takes to get a chip from initial conception to final test. “Some customers will want to do the design themselves, and for us to bring silicon, packaging, and integrate everything together,” said Rapidus’ Beica. “Our manufacturing is focused only on single-wafer processing. We don’t have batch processing. That gives us the ability to get a lot of different data from each wafer that goes back into design. So we have design and manufacturing co-optimization, and the input that comes from the customer, and combined with as our internal optimization, we can provide the customization the customer will need. What’s going to be very important is the turnaround time.”

Time is money for AI data centers, but the economics can be as complex as the mix and interactions of dies in a package. Logic can be disaggregated into chiplets and connected through a large silicon interposer using a 2.5D approach. But the larger the interposer, the higher the cost, the longer the distance signals need to travel, and the bigger the impact on performance.

Chaplets also can be stacked on top of each other in a 3D-IC or a 3.5D package, but that requires more development time. And these assemblies can include a mix of CPUs, GPUs, NPUs, TPUs, or any other variation, developed at the same or different process nodes, but integration requires a deep understanding of the physical effects of each die and a complex balancing act.

Conclusion The reasons for moving to the next process nodes are no longer about one or even two things. They can vary by market segment, by workload, or by the standard PPA/C metrics. Scaling any one of them may be sufficient for some applications, while others will require optimizing for all of them. But in an increasing number of cases, the final design will include a mix of nodes, and new ways to trade off PPA/C that balances priorities for a larger system.

“If you look historically over the last 40 years, some nodes were really good for power scaling, or for performance scaling, or area scaling,” said Lam Research’s Fried. “But at the end of the day, all of them put together make a node more valuable. Area scaling and performance scaling have slowed down a bit. Power scaling is still doing quite well as we go to these advanced device architectures, and cost scaling will be a fundamental driver of node value. If you can get maybe 1.7X the number of chips per wafer, and you get some performance and power, that becomes the play for scaling. But the end application dictates if you care most about power, performance, area or cost. For example, wearable technology will be much more area and cost sensitive than power and performance. Or, if it has to run on a battery and we never plug it in, that will be much more power-centric than area and cost.”

By: DocMemory
Copyright © 2023 CST, Inc. All Rights Reserved

CST Inc. Memory Tester DDR Tester
Copyright © 1994 - 2023 CST, Inc. All Rights Reserved