|
|
|
|
P3 + DDR Performance Analysis
|
Tuesday, October 24, 2000
P3+DDR Performance Analysis
VIA’s Apollo Pro266 DDR Chip Set
Introduction
The industry’s spotlight is shining on DDR in a big way. With a 12-month legacy of unqualified success in the high-end graphics market, users and manufacturers alike are expecting DDR main memory to be a very popular item. In contrast, RDRAM related platforms have been marred by 12 months of technical troubles, high prices, recalls, questionable performance and weak demand.
In this article, we will take a close look under the hood at VIA’s newest offering, the Apollo Pro266 DDR chip set for the Pentium3. This chip set is likely to become the biggest seller for DDR among P6 based platforms.
The very positive popular sentiment for DDR is reflected in a poll taken several months ago at www.AnandTech.com in the wake of a pair of controversial DRAM related articles published there last summer.
This poll does not pretend to be a random sample indicator of public opinion. Instead, it is a sample of the opinions of a group of well-educated enthusiasts. The poll seems to indicate that users are pre-sold on DDR as the next standard. Since this poll was taken, the tide has turned even farther away from RDRAM with a final one-two punch from Intel - admitting that the 815 outperforms the 820, and more importantly, Craig Barrett publicly confessing that Intel’s RDRAM strategy was a mistake.
The promoters of DDR could not hope for a better environment in which to launch their new DDR chip sets and platforms. We expect many new announcements over the coming months, and VIA will be a leader among them with Pentium3 and Athlon DDR chip sets that follow in the footsteps of their currently very successful PC133 designs.
Next year, P3 platforms will undergo a significant transition. As the performance markets are left to Willamette and Athlon, the P3 is beginning its downward migration into the mainstream. To take best advantage of this transition, OEMs must plan to consolidate formerly diverse P6 platforms and prepare to take a leadership role in a much more cost sensitive market. In concert with an appropriate cost reduction strategy, OEMs must deliver the performance and feature set enhancements demanded by the market.
Making the right platform decision at this juncture is critical. No manufacturer can afford to repeat any past mistakes by undershooting or overshooting the target.
The availability of VIA’s Apollo Pro266 defines the market window for DDR SDRAM on P6 platforms. It will be followed closely by VIA’s Athlon KT266 DDR that should begin sampling in November. An advance peak at VIA’s roadmap for DDR reveals new chip sets for P3 and Athlon, pin compatible versions with SMA graphics, as well as notebook solutions.
DDR is performance optimized for the near term and cost optimized for the long term. It delivers an immediate boost in raw performance for the enthusiast, it will make SMA graphics performance acceptable for the mainstream, and it will offer power consumption advantages for portable.
Though DDR will carry a small cost premium initially, it is expected to achieve price parity with PC133 sometime in the year 2001. In that timeframe, all DRAM manufacturers will have switched DRAM wafer production to a single DRAM IC design that can be used to manufacture both SDRAM and DDR. As this is accomplished, DDR and SDR will have exactly the same die size and manufacturing cost. When price parity is achieved, even value PC buyers will demand a transition to this new, higher performance technology.
DDR Performance
To evaluate performance, we compared VIA’s Pro266 reference platforms vs. Intel’s i815EA and a major manufacturer’s 820 based motherboard. All systems were configured with a 933MHz processor, 256MB of DRAM, an ATA100 HD and GeForce2 DDR accelerator. All SDRAM based systems were specifically configured at CAS Latency 3 in order to concentrate the comparison on bandwidth differences, rather than to over-emphasize the inherent latency advantage of SDRAM vs. RDRAM.
Further detailed system configuration information is shown in the table above. We must point out that the 820 board was not enabled with ATA100 support. We believe that this difference had no impact on benchmark results, except in the case of Sysmark.
We compared performance under a wide variety of conditions, including application benchmarks, 3D games, synthetic 3D graphics tests and CAD workloads. It is also appropriate, when evaluating the discrete performance impact of memory to use synthetic DRAM and processor performance tests.
The Pro266 system stability seemed excellent. We used 2 sticks of Micron 128MB DDR. These are double sided modules, which represent the largest load that can be configured with 2 DIMMs. In addition to all of the tests reported below (at CL3) we ran hours of stress tests configured at faster CL2 settings with no problems.
Synthetic DRAM Performance Tests
Initially, we examined maximum system memory bandwidth under synthetic loads, using StreamD, Wstream and WinTune98. Using the Stream benchmarks, nine test results are produced – five under DOS and four under Windows. DDR generally trounced PC133 and RDRAM. On average, the 815 comes in 15% slower, and the 820 is 10% slower than VIA’s Apollo Pro266.
The WinTune98 synthetic memory performance test produces similar results, demonstrating a more than 13% lead for DDR. These synthetic DRAM tests exist for only one purpose – that is to saturate the front side bus and DRAM bus as much as possible with read and write activity. It is difficult even in these tests to see the maximum advantage of DDR because the P3 bus at 133MHz will run out of gas before DDR does. VIA’s advantage is attributable to DDR, but also to its advanced internal buffering technique that optimizes write performance and minimizes latency penalties associated with bus turnaround.
3D Game Performance
Game performance is a factor that is closely scrutinized in the market today. Two of the more commonly benchmarked games are Quake3 and Expendable. Default resolution modes were used in both cases in order to remove the question of possible accelerator bottlenecks. The Apollo Pro266 performance advantage is a healthy 4-7% as seen in the charts below. These benchmarks stress the processor’s floating-point unit, DRAM performance and CPU to AGP command traffic for the accelerator. DDR’s high bandwidth and low latency benefit all of these functions, delivering a performance advantage that is nearly equivalent to one CPU speed grade. This is an attractive advantage for P3 game enthusiasts.
ZD Benchmarks
CPUmark99 and the 3D WinBench Processor Test from Ziff Davis shown in the chart below also indicate a consistent advantage for DDR. CPUmark99 is a very processor centric benchmark. Though it does produce some DRAM activity, its results are muted slightly compared to many of the other benchmarks. On the other hand, 3D WinBench produces results that are consistent, or even magnify the observations from the 3D game benchmarks.
3D Mark 2000 - AGP Texture Tests
Beyond what can be observed from games and 3D Winbench, it is useful to get an idea about the impact of DDR on AGP bandwidth. 3D Mark 2000 includes a large texture test that can demonstrate AGP to DRAM performance differences. In these tests the Apollo Pro266 outperforms the other platforms by approximately 3-10% depending on texture size. This is an indication that the Pro266 can deliver a performance advantage in high resolution games, when AGP texture activity is at its peak.
SysMark 2000 Application Performance
SysMark 2000 was run at medium resolution in order to offset the possibility of graphics accelerator performance bottlenecks. Also, the standard 256MB DRAM configurations helped to minimize HD activity during the test, thus concentrating the performance comparison on the processor and DRAM.
For the purposes of comparing all 12 application scores in one chart, the Apollo Pro266 is used as a baseline, overlaying the results of the other two platforms charted as a percentage relative to the Pro266 score. The Pro266 delivers scores that are equal or better than the other platforms in the overwhelming majority of cases, sometimes by a very significant margin.
On average the DDR platform exceeded the 815 by more than 3%, and exceeded the 820 by more than 7%. Though, in some cases the 820 may have suffered from its lack of ATA100 support, previous tests have revealed that the 820 and RDRAM often lags in many of these tests because of the bursty nature of the DRAM accesses generated by these programs. SDRAM will often come out ahead when the transactions on the external bus are random, and not stream oriented. This is the case for most of the applications in Sysmark.
ViewPerf CAD Performance
ViewPerf version 6.12 is a very popular workstation benchmark. In these tests the Apollo Pro266 with DDR edges out RDRAM by a small margin, while the 815 with PC133 lags behind by a larger margin (average of about -9%) as shown below.
At any resolution, this benchmark is very CPU and DRAM centric. Most of these tests use 3D scene models that significantly exceed the size of the L2 cache. While the geometry processing code fits neatly in the instruction cache, the 3D model datasets badly thrash the processor’s data cache and L2. As this thrashing occurs, the CPU chunks through huge amounts of floating point data in a somewhat serial fashion. This benchmark is dominated by pipelined data reads on the external bus. For this reason ViewPerf is very sensitive to DRAM bandwidth. This makes Viewperf a good benchmark to demonstrate the common strengths of RDRAM and DDR.
Overall, these benchmarks seem to indicate an attractive performance advantage for DDR and VIAs’ Apollo Pro266 chip set. This performance advantage cannot be attributed to latency differences alone, since both SDRAM systems were configured to the slowest CL3 timings. In fact, there is still plenty of room for further DDR performance optimization, as CL2.5 and CL2.0 become the dominant speed grades.
Impact of DDR + V-Link on I/O Performance
DDR has proven to be very good for processor performance and for graphics performance. But VIA’s goal is to leverage DDR to enhance I/O performance as well. In order to fulfill this ambition, VIA has developed its V-Link architecture. V-Link seems similar in many respects to Intel’s HubLink approach.
DDR seems to solve any possible bandwidth shortage in the north bridge. But in the south bridge, a different bottleneck is brewing. In essence, the PCI bus is gradually becoming over-crowded with high-speed peripherals that are capable of generating bandwidth demand in excess of PCI’s capacity. PCI could become a performance bottleneck for high speed master mode peripherals such as ATA100, Ethernet, USB, USB2, 1394, etc.
For example, the ATA100 disk interface is theoretically capable of data bursts at up to 100MB/s. New ATA100 systems can be benchmarked to achieve around 85MB/s for short bursts, though long-term average throughput is closer to about 35MB/s for sequential read activity. Below is a table that shows the peak bust bandwidth of the several high-speed interfaces that are now, or soon will be part of the PC platform. In addition to peak burst bandwidth, we have also indicated reasonable estimates of sustainable bandwidth based on superficial analysis from various sources. These sustainable bandwidth figures are not meant to be entirely conclusive, but merely an indication what might be expected based on different workload models.
From this table, it is easy to imagine how the combined potential bandwidth demand of a few of the peripherals listed here could saturate PCI. To offset this potential bottleneck, VIA relies on its V-Link architecture. V-Link is a narrow high-speed local interface between its north and south bridge chips that can sustain 266MB/s peak burst bandwidth.
In order to test VIA’s theory, we synthesized a 1GB disk-to-disk file copy benchmark for ATA100, comparing the performance of integrated controllers vs. the reputable Promise Ultra ATA100 PCI based controller.
Two identical IBM ATA100 drives were installed as master devices, one on each of the two IDE interfaces for each system. A 970MB file was copied from a 2GB FAT16 partition located on the innermost tracks of one drive, to an identical empty FAT16 partition on the other drive. A short batch program was launched from a DOS window to execute and time the copy procedure. Results were quite reproducible. Data rates were calculated based on run times.
When comparing the Apollo Pro266 vs. 815 platforms with their integrated controllers, the 815 based platform shows a performance disadvantage of about 9% (denoted by the purple arrow in the chart below). This difference would be attributable to some combination of DRAM performance, ATA100 controller performance, and other chip set architectural attributes. On the 815, there was no difference in score when tested with Intel’s IDE driver or with the standard Win98 IDE driver.
When we moved the ATA100 controller to PCI, throughput was reduced by 24% for VIA, and 37% for Intel (denoted by the yellow arrows in the chart above). This data suggests that PCI may be a performance bottleneck, but we cannot be sure until we have the opportunity to test DDR chip sets using a PCI south bridge with integrated ATA100 support. Until then, at least, this data seems to show that the I/O throughput of VIA’s V-Link and DDR is significantly better than PCI and also superior to Intel’s HubLink.
Summary
Main memory system production will begin very soon. Manufacturers seem intent on delivering DDR into every market segment over the next 8 to 12 months. Even with all of this activity, Intel appears to be on the sidelines, languishing under a contract it wishes it had not signed. Intel’s next generation P3 platform, Almador, is often rumored to have DDR capability, but Intel is not forthcoming. All indications are that Intel may yet withhold its own P3 DDR platform until the second half of 2001. This leaves a huge window of opportunity for VIA’s customers to open the market, take the performance lead, build momentum and cost reduce DDR platforms for the mainstream, perhaps before Intel even samples a DDR chip set.
Authored by Bert McComas
InQuest Market Research
Oct 24, 2000
By: Bert McComas - InQuest Market Research Copyright © 2023 CST, Inc. All Rights Reserved
|
|
|
|