Wednesday, September 28, 2016
Calling for 100x faster processors, China Web giant Baidu released DeepBench, an open source benchmark for how fast processors train neural networks for machine learning.
DeepBench is available online along with first results from Intel and Nvidia processors running it. The benchmark tests low-level operations such as matrix multiplication, convolutions, handing recurrent layers and the time it takes for data to be shared with all processors in a cluster.
Machine learning has emerged as a critical workload for Web giants such as Baidu, Google, Facebook and others. The workloads come in many flavors serving applications such as speech, object and video recognition and automatic language translation.
Today the job of training machine learning models “is limited by compute, if we had faster processors we’d run bigger models…in practice we train on a reasonable subset of data that can finish in a matter of months,” said Greg Diamos, a senior researcher at Baidu’s Silicon Valley AI Lab.
The lab has found, for example, it can reduce by 40% errors in automatic language translation for every order-of-magnitude performance improvement in computing. “We could use improvements of several orders of magnitude--100x or greater,” said Diamos.
No striking results emerged from initial tests on NVidia TitanX, TitanX Pascal and M40 GPUs and Intel Xeon Phi processors, according to the Baidu researchers.
“Even for the same type of operations like matrix multiplies, depending on the sizes of the models and the ways they are used, performance varies even on the same processor,” said Diamos. “We aren’t as concerned about minor differences between these processors--we want both to be 10 times faster,” he said.
Nevertheless, one executive took occasion to claim bragging rights.
“Baidu’s DeepBench results clearly highlight Pascal as the performance leader across all deep learning workloads,” said Ian Buck, vice president of accelerated computing at Nvidia. “When full applications are benchmarked, such as Caffe AlexNet or Caffe VGG-19, our internal testing shows Pascal is up to 6x faster than [Intel's] Knights Landing,” he added.
The Baidu researchers also hope other processor vendors and data center operators contribute to expanding and running chips on the benchmark.
“I’d be personally interested in results on AMD GPUs and ASICs from startups with custom hardware for whom it may be difficult to run full models -- this might be an easier way to for them to get their capabilities out there,” Diamos said, noting the lab currently uses systems with eight Nvidia TitanX processors to run its speech recognition models.
DeepBench tests low-level hardware libraries, not higher level AI frameworks the data centers create such as Baidu’s PaddlePaddle and Google’s TensorFlow.
“At the framework level, there’s a huge amount of difference in the models, and different models for different apps, but below the frameworks they use a few common operations,” said Sharan Narang a software engineer in Baidu’s AI lab. “The hope is we can find the core common operations that are more actionable for hardware makers,” he said.
The researchers are considering whether they need a benchmark for inferencing, the separate job of using the models to find patterns in data. Today Baidu and Microsoft use FPGAs on servers to accelerate that less compute-intensive work.
Whether competing data center operators collaborate on DeepBench remains to be seen.
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|