Home
News
Products
Corporate
Contact
 
Tuesday, June 24, 2025

News
Industry News
Publications
CST News
Help/Support
Software
Tester FAQs
Industry News

Tenstorrent to Challenge NVIDIA with RISC-V


Friday, June 20, 2025

Fresh off the launch of the company’s Blackhole chip, Tenstorrent CEO Jim Keller is feeling happy.

“I told my leadership team this is my best day at Tenstorrent in four years,” Keller told EE Times in an exclusive interview. “Our CPU team is kicking ass, we delivered the Beta drop for [RISC-V CPU IP core] Ascalon, our first training computer is up and running, we’re shipping Blackhole computers, and we just talked to this kid who’s building a compiler on top of our technology, who thinks what we’re doing is great.”

Outside Keller’s office door in Santa Clara is a whiteboard on which he has written, “We’re going to win!” in big letters, alongside a tally of models currently running with “useful” levels of performance and reliability on Blackhole-generation hardware, which currently stands at five for the company’s TT-NN compiler stack. Another 15 models are “coming soon,” he said.

AMD Powering the Future of Embedded Computing

The Forge team, building an MLIR compiler for the Tenstorrent stack, is further behind, but Keller said he has “hundreds” of engineers working on it.

“MLIR is a really big win,” he said. “That was a really good choice, it felt great.”

Contributing to MLIR, an open-source project, is something Keller is only too happy to embrace. Tenstorrent’s entire software stack is open-source; Keller said this has been a popular decision both inside and outside the company, even helping recruitment since the open-source ethos appeals to engineers.

“We’re finding traction in all kinds of ways, because people can build software on top of our stuff and it works, or we fix it, so they’re pulling for us,” he said. “They like the fact that there’s a real open-source software stack.”

Tenstorrent silicon and IP is also built on the open-source instruction set RISC-V. Fed up with the pace of some decisions in the RISC-V world, Keller said the company is now leading the way in some areas.

“We’re investing in [RISC-V] compiler technology,” he said. “We lifted the performance of LLVM by 10%, which we contributed to open source. The operating system, drivers, tool chains—everything is getting better. I’m happy; RISC-V is good, we made a good decision and we’re going to make money on it.”

RISC-V will win in the long-run over ISAs that allow zero or limited customization, Keller said.

“AI code generation is going to change code—it’s going to be way more parallel and it’s going to change CPU architecture,” he said. “With [other ISAs], you have no control over that, and with RISC-V I do, and we are leaning in to it.”

Market leader Nvidia recently announced it would license its NVLink IP to selected companies building custom CPUs or accelerators; the company is notoriously proprietary and this was seen by some as a move towards building a multi-vendor ecosystem around some Nvidia technologies. Asked whether he is concerned about a more open version of NVLink, Keller said he simply does not care.

“People ask me, ‘What are you doing about that?’ [The answer is] literally nothing,” he said. “Why would I?I literally don’t need that technology, I don’t care about it…I don’t think it’s a good idea. We are not building it.”

Tenstorrent chips are linked by the well-established open standard Ethernet, which Keller said is more than sufficient.

“Let’s just make a list of what Nvidia does, and we’ll do the opposite,” Keller joked. “Ethernet is fine! Smaller, lower cost chips are a good idea. Simpler servers are a good idea. Open-source software is a good idea.”

Keller also highlighted Tenstorrent’s focus on cheaper chip packaging; the company avoids HBM in favour of GDDR6.

“If you copy the leader exactly, you’ll get 20% of the market, but at a price discount and you won’t create a new market,” he said.

Chinese market

Keller knows of at least one company that is using Tenstorrent’s open-source stack for its own AI hardware. This company, based in China, submitted bug reports, which Keller had no problem with the Tenstorrent team fixing. This is part of the nature of open-source software, he said, even if it means potentially helping a Chinese competitor.

Tenstorrent does and will continue to address the Chinese market. Previous-gen Wormhole hardware can be shipped to China under current U.S. export regulations, Keller said, but Blackhole will need to be de-featured, provisions for which are built into every part of the silicon. Ascalon CPU IP also has to be de-featured for Chinese customers. “I don’t think it does the U.S. any good to [regulate exports of AI technology],” Keller said, noting that export controls on semiconductor manufacturing equipment have meant China has doubled down on developing this technology internally. “As best as I can tell, in the last five years of regulation of semiconductor equipment into China, that’s sped China up about five years.”

“You win through innovation, not regulation. I think that’s been pretty obvious for a really long time,” he added.

Tenstorrent currently has European offices in Serbia, Germany and Poland. The company is due to open an office in Cyprus after the government reached out. This office will work on joint projects with Cypriot universities using Tenstorrent computers.

Countries want to retain control of their AI technologies, rather than rely on U.S. hyperscalers, Keller said.

“They like the fact that our software is open source, so they can do stuff,” he said.

Spain is another likely future office location, given the country’s supply of RISC-V talent and support from the government, Keller said.

In Japan, where Tenstorrent is working with fledgling foundry Rapidus, work is proceeding. Radpidus’ 2-nm pilot line is up and running, and the foundry delivered a PDK ahead of schedule, Keller said.

“We synthesized part of our CPUs and we are sending them feedback,” Keller said. “The numbers were about what we expected.”

Training cluster

Now that Blackhole chips are available, Tenstorrent is working on building successively bigger training clusters. So far, a training cluster of 6 Backhole Galaxies (192 chips) has been built, with bigger clusters coming over the next six months. The eventual aim is a data plane engine of 16 Galaxy servers, with another 16 Galaxies for switching (Tenstorrent uses its own chips as switches) and another 16 as an optimizer. This includes a certain amount of redundancy.

“The demo is going to be [the cluster] running and then you pull any cable out and it doesn’t stop running,” he said. “In principle, we could turn off any server we want and it would keep running.”

Korean company Moreh is building a training stack for Tenstorrent hardware, but that is for their own product, Keller said. The stack for Tenstorrent’s own training clusters is being developed in-house.

Fast inference can be achieved with four Galaxies (128 chips), Keller said, noting that reasoning is due to require a million times more tokens versus today’s LLM workloads (the real upper bound on the amount of inference compute required is the limit on how many unique questions humans can come up with, he said). Even though inference is likely to be the bigger market in the long term, Keller firmly dismissed any suggestion that the company would or should focus entirely on inference.

“I have a mission to radically lower the cost of training,” he said. “The fun challenge is how do you become a proper development platform to do new things, I’m really interested in that. How do you do new things if you can’t train?”

Training is an essential part of the stack, just like building an in-house CPU is, Keller said.

“I want to build great AI computers,” he said. “That’s the problem with doing part of a thing. [If we focused on serving models from the cloud, for example], what if somebody wants to develop models, where they want to put their own AI technology in their own product? I’m interested in that bigger picture.”

This is despite the fact that training and inference are separate, uniquely difficult problems to solve—especially at scale.

“Everybody says AI hardware is just a matrix multiplier with some software—that’s the stupidest thing I’ve ever heard,” he said. “Yes, there’s a matrix multipler. Yes, there’s some software. But making it work at scale…”

The biggest version of the training computer Tenstorrent is currently planning will have two million RISC-V cores operating in parallel, programmed from a single program.

“AI is stylized, it’s not general-purpose HPC programming, but it’s still fairly dramatic, co-ordinating millions of processors from a simple program with tens of people is pretty good,” Keller said.

“We are going to build stupidly big computers,” he added. “It’s really fun.”

By: DocMemory
Copyright © 2023 CST, Inc. All Rights Reserved

CST Inc. Memory Tester DDR Tester
Copyright © 1994 - 2023 CST, Inc. All Rights Reserved