DDR vs Rambus
Monday, November 1, 1999
With the long anticipated release of Intel's new 820 chip set, the world has its first opportunity to evaluate the system level performance impact of DRDRAM. Until now, publicly available performance reports on DRDRAM have compared Rambus to PC100 or PC133 SDRAM.
According to these reviews, the benchmarkable performance difference between Rambus and SDRAM is usually between 0 to 2%. When differences attributable to DRAM performance are discovered, the advantage seems to flip flop unpredictably between SDRAM and Rambus. Amid this confusion and lack of clarity, the majority of users will tend to gravitate toward the low cost solution.
Yet, there are key market segments that will seek and pay for premium performance in DRAM (and other aspects of platform performance). For example, the overwhelming majority of server vendors have chosen DDR SDRAM to enable the next level of performance. Similarly, 3D graphics chip makers have selected DDR for their upcoming high end products. The force behind these decisions has been a combination of cost, performance, capacity and infrastructure continuity.
However, there are two remaining performance driven market segments for which a clear transition direction has not yet emerged. To be specific, workstation users and PC enthusiasts have not yet embarked on a clear migration toward PC133, DDR or Rambus. Users in these market segments have demonstrated a reluctant willingness to sacrifice infrastructure continuity and cost in the name of performance. Currently, they are seeking the performance advantage that would move them in one direction or another.
The next generation SDRAM standard, DDR, uses a double data rate clocking technique to push its peak burst bandwidth to 2.1 GB/s, as compared to 1 GB/s for PC133 and 1.6 GB/s for Rambus. To the extent that some assume DRAM bandwidth to be a possible future inhibitor to system performance, DDR should be able to fill the bill. But most will agree that future performance potential, or "headroom", is not nearly as important as a solid and consistent performance advantage that can be demonstrated on today's processors, platforms and applications.
To clarify the point, "headroom" is marketing gobbledygook for DRAM bandwidth that significantly exceeds the practical demand of current processors. This is not necessarily synonymous with an immediate performance advantage. When it comes to DRAM, immediate performance advantages are usually derived through optimized DRAM access latency.
With the anticipation that DDR SDRAM might potentially offer both (increased headroom and an immediate performance advantage), the market has been keenly awaiting DDR platforms for a thorough performance evaluation.
For weeks, it has been rumored and reported that Micron Technology has been developing a high performance DDR based north bridge chip for use with Intel's PIII class processors. Such a product would seem well suited to a broad range of platforms and market segments including workstations, small workgroup servers, application servers and the high-end enthusiast PC market.
Micron's commitment to this development effort seems fueled by its interest to enable DDR SDRAM in the market. As such, this north bridge IC will probably find its way to market through a number of different yet complementary avenues. This will begin to unfold before the end of 1999.
Micron supplied InQuest with an early development board based on first silicon of the north bridge IC and first silicon Micron DDR SDRAMs. Though the silicon is still very fresh, the system was rock solid at 266MHz with three of the four memory slots occupied with buffered 64BM DIMMs.
Based on industry specifications, it should be possible to manufacture unbuffered systems with two or three DIMM slots. Buffered DIMMs will be required for 266MHz operation using 4 DIMMs. Hitachi is currently manufacturing 256Mbit DDR SDRAMs that will enable single DIMM capacities of 512Mbytes and stacked DIMMs of 1Gbyte each.
Intel supplied InQuest with a pre-production 3 RIMM version of its Vancouver motherboard and PIII-733 processor. Samsung offered two double sided 256Mbyte 800MHz RIMMs, each containing 16 128Mbit RDRAMs. The 820+DRDRAM platform has nearly reached production worthiness, and we expect that 820 based systems should become generally available in Q1 of 2000, more than 14 months and perhaps six revisions after its first silicon.
To evaluate bandwidth headroom to the processor, we chose the well-known StreamD benchmark released by the University of Virginia. It is a popular cross platform benchmark that evaluates effective bandwidth from DRAM to the processor. StreamD is the most reproducible and precise benchmark of its kind. Its margin of error is regularly under 1%.
The results here are remarkably decisive. DDR beats the 820+Rambus by a significant margin in all of the tests, exceeding 30% in some cases and averaging to a 24.4% performance advantage for this benchmark.
It is interesting to note DDR's significant performance advantage of 20% and 34% in the Copy32 and Copy64 functions. It is popularly believed that one of weaknesses of SDRAM (including DDR) is its longer bus turn around latency. As such, it is somewhat surprising that Rambus does not produce a more competitive score on these tests. Also, when comparing platform bandwidth figures for Copy32 vs. Copy64, one would expect that Copy64 data rates should be equal to or higher than the Copy32 figure. This is true for DDR, but in the 820 platform, Copy64 is actually 10% slower than Copy32 performance.
Recently, a Windows version of the Stream benchmark has been developed, known as WSTREAM.EXE. The precision and consistency of this test is not nearly as high as StreamD. It regularly suffers from a compound precision error rate of up to 30%. In addition, this program is documented by its developer as being inaccurate under Windows NT Server 4.0. Though I have less confidence in these numbers, I include them here for completeness.
Scores were recorded for 10 benchmark runs on each platform with 200 interations. The benchmark was launched several times after a clean Win98 boot, then again after loading and unloading numerous large applications. The widely varying results from all of these tests were averaged to generate the figures in the chart above.
In this benchmark DDR exceeds the performance of the 820 platform by an average of 2.7%. The only test where DDR falls behind (by 0.5%) is in the TRIAD test. It is probably no co-incidence that Intel refers only to the TRIAD test when making reference to this benchmark in its presentation material at IDF and in its white papers. (Also, for an unknown reason Intel refers to the WSTREAM.EXE benchmark as StreamNT.)
Even though the platform performance differences seem muted in this version of the benchmark, it is still clear that DDR pulls ahead, showing particular strength again in the copy function.
Intel Platform Tests
Intel's Platform Test program includes two bandwidth evaluation programs. The first is a concurrent CPU/AGP/DRAM bandwidth test, and the second is an AGP bandwidth stress test.
The first test has proven to be unreliable in the past, but Intel has released an updated version (v1.2). Versions 1.0 and 1.1 report results that are clearly in error. Though this is not an exhaustive evaluation, it seems from the results gathered here that the test may no longer report results that are mathematically impossible.
The Platform Bandwidth test reports results that are ostensibly the same for the DDR platform and the 512Mbyte Camino configuration. It is interesting to observe, however, that the Camino score improves by a very reproducible 2% after reducing its configuration to 256Mbytes of RDRAM. More on this later.
At the Fall '99 IDF, Intel offered its effective bandwidth analysis for DDR and Rambus as shown on the left portion of the chart above. If we use Intel's Platform Test results as an indication, Intel's estimates may be in need of serious revision. Micron's PC266 DDR outperformed Intel's estimates for 200MHz DDR by a whopping 58%, while the 820 under-performed by 15%. For clarification, on the right half of the diagram above, InQuest offers an enhanced version of Intel's chart reflecting the actual test results reported in this document using Intel's benchmarks.
AGP Bandwidth Analysis
The second part of Intel's Platform Test program evaluates AGP to DRAM bandwidth by saturating the AGP bus with Execute Mode texturing activity. Initially, this test seemed uninteresting because it demonstrated no significant performance differentiation between the various platforms available at the time of its original release. Essentially, all AGP2x systems scored about 30fps, while all Agp4X systems scored about 40fps.
Micron's DDR platform adds a bit of spice to the situation. This benchmark reveals some very interesting performance differences between DDR and Rambus, with an interesting twist in performance based on RDRAM capacity.
Here it can be seen that the DDR platform significantly outperforms Camino+Rambus. For a long time, it has been widely presumed that Intel's implementation of AGP and its data path to DRAM is better than any other chip set or architecture available. In this case DDR outperforms the 820 with 256MB of RDRAM by 13.1%, and outperforms the 512MB configuration by 19.8%.
A side note - here again is evidence that Camino performance diminishes when configured with 512Mbytes, as compared to 256Mbytes of RDRAM. In this case the loss is 5.6%.
Next, in order to ensure that the huge DDR advantage demonstrated in the test above was not a fluke, I also ran tests using Intel's IBASES v1.5. This program evaluates execute mode frame rates as it copies multiple textures per frame via AGP to the display buffer. Ranging from one texture per frame all the way up to 256 textures per frame, the DDR platform delivers a solid performance advantage at every step as AGP texturing demand is increased. The median improvement that DDR offers over RDRAM is 11.6%.
This solidly substantiates the results observed using the AGP portion of Intel's Platform Tests. It can be stated with certainty that Micron's DDR platform offers significantly better AGP to DRAM bandwidth than Camino with Rambus.
Other Observations: DDR vs. Rambus
An application benchmark analysis of DDR vs. RDRAM will be forthcoming. A quick pass on both platforms with several function specific benchmarks such as CPUmark99, 3Dmark Max, Intel Media Benchmark and games such as Expendable have revealed a very small but consistently measurable performance advantage for DDR over RDRAM. This work will be conducted more exhaustively in the near future.
Considering the early state of this DDR development and validation platform, there are several pending optimizations that could further expand its performance lead. Foremost on the list is the use of unbuffered DIMMs rather than the buffered DIMMs use in these tests. This will have the effect of removing one clock of latency from the DRAM subsystem. It is reasonable to expect many benchmarks to show an immediate benefit from such a change.
Also, the use of larger DDR SDRAM configurations will have a positive impact on system performance for application benchmarks (as compared to the 128M and 192M configurations used in this exercise).
In the same vein, it has been interesting to observe that enlarging RDRAM configurations on the 820 platform seems to have the opposite effect. This is very likely due to the limitation in the 820 that prevents it from maintaining more than 16 memory chips in the 'on' state. When more than 16 RDRAMs are used in a system, a performance penalty can arise due to power management as demonstrated in these benchmarks. This may be a difficult problem to circumvent in the performance sensitive markets.
The 10-30% performance advantage for DDR over Rambus demonstrated in the benchmark scores above are truly astounding. This performance advantage will immediately be appreciated in server platforms, in high-end graphics and in other high performance systems.
Even the low end PC may stand to benefit very soon from DDR technology as UMA (shared memory) platforms appear in the year 2000. Indeed, DDR may be the catalyst that enables UMA to grow from the bottom of the market to the mainstream beginning in the second half of 2000.
Regardless of the application or target platform, the single key element in the DRAM performance equation is to deliver superior memory performance without adding significantly to system cost. DDR may be the technology to deliver this combination to the mainstream.
By: Bert McComas - InQuest Market Research
Copyright © 2019 CST, Inc. All Rights Reserved