Monday, August 1, 2022
Alchip Technologies, a contract designer of AI silicon, punches above its weight in leading process nodes. The company expects to join much larger fabless companies with the world’s first 3nm test chips early next year.
Alchip and other customers of Taiwan Semiconductor Manufacturing Co. (TSMC), such as Nvidia and Qualcomm, are using TSMC’s N3E process design kit (PDK) to evaluate the new node. N3E is an extension of TSMC’s 3nm process, the latter of which will enter production in the second half of 2022.
TSMC’s Fab 18 is its main 3nm production facility (Source: TSMC)
“We have many tier–one high–performance computing (HPC), AI, and GPU customers across all of our geographic markets,” Leo Cheng, senior VP of Engineering at Alchip, said in an interview with EE Times. “Particularly those working on datacenter center applications who see power as a very critical concern.”
While he’s bound by non–disclosure agreements to keep the identity of AIchip’s customers confidential, Cheng says a U.S. client is one of the largest data center providers in infrastructure as a service. Alchip also has one of Japan’s largest AI companies and another from China — its top HPC client — on its customer roster.
HPC is one of the fastest growing segments of the chip industry, yet data center and cloud computing providers that use HPC chips are top contributors to global warming because of their huge energy consumption. As a result, energy efficiency has become a priority for Alchip customers.
Clients typically provide the company energy consumption criteria like teraflops per watts.
“Customers care about even a very tiny voltage compensation on the regulator side,” Cheng said. “For example, for a nominal voltage of 0.85v operation, a 4%, 35 mV deviation is very critical for the voltage compensation in a data center. It will actually save a lot of energy.”
The main ways to cut energy consumption are at the front–end and back–end design stages, according to Cheng. At the front end, a better architecture incorporating parallel or distributed processing helps. One Japanese customer used a unique approach.
“The chip actually doesn’t run very fast, only like 500 megahertz to 1 gigahertz, but they could still compete in the so–called Green 500 supercomputer competition,” Cheng said. “They won and were actually in the top three.”
For the backend or physical design, clock design is the focus, according to Cheng. Alchip offers its mesh–type Fishbone clock structure providing advantages in on chip variation, skew control, routability, and yield.
“With a good clock structure like Fishbone, we don’t need to over design by adding too much margin or logic,” he said. “The result is a low–power clock network that reduces overall chip power consumption.”
The company also helps customers re–characterize libraries for dynamic voltage and frequency scaling designs to achieve optimum trade–offs between performance, frequency, and power consumption. Alchip sees a lot of re–characterization activity across HPC, graphics processing, and AI applications to find the best mix, according to Cheng.
Another constraint is the package and its maximum power tolerance.
“For example, one package may tolerate, say, 400 watts,” Cheng said. “We design from there really to find out the better optimization point for energy and performance. Many years ago, people were just aiming for a frequency like 3 gigahertz or higher. But nowadays, you can clearly see that power is number one. They probably want to squeeze in more cores, engines inside any single chip.”
The company sees chiplets as the next wave. With the migration to 3nm, chiplet solutions can achieve better yield and save costs while minimizing time–to–market, he explained.
Combining chiplets from different companies in one SoC is the tricky part. The key is probably the I/O interface, according to Cheng. That’s why there’s a newly proposed UCIe D2D (Die2Die) connection standard, he added.
Compared with TSMC’s 5nm node, N3 can save more than 20% for power leakage, according to Cheng. For dynamic power, improvement is slightly over 10%.
At advanced nodes, Alchip does performance–power–area (PPA) comparisons for customers because N3 isn’t necessarily the best choice.
As one of the early adopters of N3, Alchip started using TSMC’s PDK at the 0.7 version. In advanced nodes, Alchip performs a design methodology set up, even when the EDA tools are not ready.
“We’re entrusted to do advanced-node designs with early-adopter EDA tool versions,” Cheng said. “We work with EDA tool partners to find and resolve weaknesses. Advanced nodes, because of the nature of their materials and physics, always present an array of new challenges.”
At the 0.9 version of a PDK, Alchip usually will tape out a design, he added. “We need to actually understand the process very well to report back to our customers whether this is the real PPA number and the real performance or power number.”
With a silicon correlation number in hand, the company helps customers evaluate whether their performance or energy efficiency targets are feasible.
Sole 3nm source
Although TSMC rival Samsung earlier this year became the world’s first to offer a 3nm process to foundry customers, Alchip plans to rely on TSMC at the most advanced node, just as it has at 7nm and 5nm.
“There’s no other foundry at the moment that can compete with TSMC for readiness or yield control,” Cheng said. “Samsung or even Intel, they are actually approaching us. So far, we are still sticking with TSMC.”
“For Alchip, our main business is actually the turnkey business. It’s not only a design service. We really want to support our customers to go for mass production. If that’s the goal, we want to have a very good yield, and also all the ecosystems must be there. TSMC is still holding that position well.”
Copyright © 2019 CST, Inc. All Rights Reserved