Wednesday, August 23, 2000
DRAM Test Background
How does DRAM failed?
It is important to make the distinction between a physical defect in a DRAM and a memory failure. A physical defect is anything within the physical structure that deviates from what was intended, such as the presence of unwanted material, the absence of desired material, and imperfections in the lattice of the substrate. A physical defect may or may not lead to a failure, a situation in which the device behaves in such a way that violates its specifications. Physical defects that are not serious enough to immediately cause failure are known as latent defects. Latent defects may worsen with time and eventually reach the point where they do cause failure.
Plotting the failure rate vs. time for integrated circuits (ICs) that operated properly after manufacturing yields the so-called bathtub curve. The bathtub curve has three main regions. Initially, the failure rate is relatively high, because many latent defects develop into actual failures. This high failure rate drops off to yield a long, flat period on the curve in which the failure rate is low. In this period the reliability of the device is high, e.g. 10 to 100 failures in 10^9 hours under normal operating conditions. The third period of the curve represents the end of the IC's lifetime. The failure rate rises as long-term, slowly developing failure mechanisms (such as electromigration, corrosion, and mechanical stress) begin to be significant.
To compensate for the relatively high failure rate at the beginning of an ICs lifetime, manufacturers typically utilize a burn-in period when the ICs are operated continuously for several hours at high temperature and possibly high voltage. This is designed to accelerate the mechanisms that cause early failures. The devices are tested following burn-in, and manufacturers therefore ship devices that are theoretically at the beginning of their high-reliability period. It may be possible to eliminate the burn-in step with advances and greater control in the fabrication process; this is a matter of debate.
Incorrect behavior is described at some convenient level of abstraction as a fault. Faults with similar behaviors are grouped into fault types, and a set of faults that supposedly describes all types of faulty behaviors is known as a fault model. The important point to get is that accurately modeling all of the possible faults and devising tests that can detect all of them is not a simple, straightforward procedure. For instance, a common type of fault is a stuck-at fault, where a node is assumed to be stuck-at either a 0 or a 1 value. Detecting whether any of the bits in a memory array are stuck at 0 or 1 is fairly straightforward. Write all 0's, then read all of the values back and check that they are all 0; next write all 1's, and read all of the values back and check that they are all 1. But this would not detect a fault where the contents of one cell affected another cell nearby (possibly through a short circuit or by capacitive coupling). Detecting such pattern sensitive faults can be much more difficult. Some other examples of faults are access time failures (caused by signal transitions in the address decoders that are too slow or too fast), transition faults (where a cell can be written from 0 to 1 but not from 1 to 0, or the other way around), and data retention faults (when a cell loses its value after some amount of time less than the amount required by the DRAM refresh).
Standard Memory Test Methods
Memory tests can be grouped into the following categories:
DC parametric tests - measure static analog characteristics of the I/O interface.
AC parametric tests - measure dynamic parameters of the I/O interface.
Functional (or Boolean) tests - verify whether or not the memory performs the correct logical functions.
Dynamic tests - detect timing faults affecting the circuitry within the device under test (DUT).
IDDQ tests - measure the quiescent current from the power supply terminal for each test vector in a functional test. All IDDQ readings should be very low in parts with no defects. This can be used to detect latent defects.
Test Patterns and Test Time Trade-off
Some examples of memory tests are the 0-1 test, checkerboard tests, columns and bars, sliding diagonal, walking 1s and 0s, galloping 1s and 0s, and the march test. Tests can either be designed ad hoc (the traditional method) or derived from fault models (which can be difficult and expensive). Some people have investigated probabilistic or random testing, in which some portion of the test structure is determined by a random number generator. As memories get exponentially larger, only tests whose lengths group with n or n log2 n (where n is the size of an n x 1 memory) will be economically feasible. (A counter example is a galloping test which has length 4*n^2.) The number of tests used (and therefore the total test time) drops as a memory design matures and its erroneous behaviors are better understood. Over the several years of a production run, the total test time can drop by as much as a factor of 50.
DRAMs typically improve yield by using spare rows and columns to replace those occupied by defective cells. The repairs are performed using current-blown fuses, laser-blown fuses, or laser-annealed resistor connections. Case example had shown that memory repair increased the yield from 1% to 51%.
A typical production run usually involves testing (marked in bold) an IC several times:
Wafer fabrication
Wafer probe - includes determining location of faulty cells for repair
Repair - this may require transferring the wafers to another piece of equipment
Post-repair wafer probe (Optional)
Die separation and packaging
Pre-burn-in test (Optional)
Burn-in
Final test - includes speed select
Shipment
Built-in Tests
Design for testability (DFT) refers to including test considerations into the design specifications. DFT includes using design rules that forbid the use of certain hard-to-test circuit forms and/or require using inherently testable forms. It attempts to ensure that internal nodes are sufficiently controllable and observable. Specific DFT techniques include providing parallel test modes to test multiple arrays simultaneously, built-in self-test (BiST), built-in self-repair (BiSR), and error correcting codes (ECCs). BiST and BiSR will be discussed later. One important consideration with ECC is that it must be possible to disable the ECC during testing so that bona fide errors are not missed due to the correction.
Cost and time of DRAM test
There seems to be a general consensus (by too many sources to make it worthy of listing them all) that testing can be a significant portion of the cost of a DRAM. DRAM capacities are growing exponentially (roughly x4 every 3 years), and test times will grow in proportion to the size of the DRAM unless steps are taken to reduce test time. If test times grow too large, testing could become a limiting cost in the production of a DRAM.
Testing can be expensive because the test equipment is very expensive (a production memory tester costs approximately US$500,000. and the cost of the equipment must be amortized over all of the good chips that are produced. Each second a chip is on a tester is estimated to cost several cents. It is difficult to say exactly how much of the cost of a DRAM is from test, both because detailed data is not available from the manufacturers and because, as stated earlier, the test time for a given generation of DRAM declines as the process matures. An economical test time for a mature 1 Mb DRAM is roughly 1 minute or less. Testing accounts for up to 50% of the cost of a DRAM. Some estimates that the test cost is only 10-20%. Test cost for a 64 Mb DRAM is projected to be 20-25% of the total cost. The following predictions is made for the future, normalizing to the costs of testing a 1 Mb DRAM, under the assumption of what the test costs would be if nothing was done differently in the testing methodology relative to what was done for a 1 Mb DRAM:
Reducing test cost
Using a Testing Acceleration Chip (TAC), in which functional tests for a memory are performed by a special chip. Then the tester cost would decrease by a factor of 10, the number of devices that could be tested simultaneously would increase by a factor of 10, and the test cost would therefore be decreased by a factor of 100. This leads to a revised estimate for the ratio of the test cost to total cost as follows:
A number of test modes have been proposed and/or utilized to take advantage of the inherent parallelism within a DRAM to reduced test time. Multi-bit test (MBT), introduced in 1 Mb and 4 Mb DRAMs, tests from 4 to 16 bits on each cycle, thereby reducing test time by 1/4 to 1/16. Line-mode test (LMT), introduced in 16 Mb DRAMs, reduces the test time by 1/1K. However, it requires an additional comparator or register at each bit-line pair, causing a somewhat large area penalty. Furthermore, identifying failed addresses requires the use of a conventional bit-by-bit test sequence. This has been addressed by some modifications to the column decoders, but this causes an even larger area penalty.
A test method known as column address-maskable parallel-test (CMT). CMT requires no additional circuitry and no additional routing lines in the memory array in order to perform parallel test. Instead, the test control circuits are placed in the peripheral circuitry. The test mode therefore does not impact the layout pitch. Increasing the layout pitch can degrade performance and significantly increase chip area. CMT provides for various kinds of test patterns, including those to detect pattern sensitive faults. It also has the capability to quickly search for failed column addresses. CMT was implemented in an experimental 64 Mb DRAM (by Mitsubishi), resulting in a test time reduction of 1/16K with an area penalty of less than 0.1%.
A test method known as merged match-line test (MMT). MMT minimizes the area penalty by utilizing the read data line as the match line during test mode operation. The memory array therefore requires no additional comparators and match lines. Further examination of this technique leads me to believe that this is just another name for CMT, or possibly a slight variation. Besides mentioning the 1/16K test time reduction claimed by CMT, it also presents an extended, parallel mode of MMT which leads to a test time reduction of 1/64K.
Dividing the possible parallelization of testing into two categories: architecture based parallelism and DFT based parallelism. Examples of the former are: 1) A RAM is often implemented as a collection of arrays, and it may be possible to simultaneously test all of the arrays; 2) RAM may be more than a bit wide, and therefore multiple bits may be tested simultaneously. Examples of the latter are: 1) Modifying a decoder so that many word lines can be activated simultaneously; 2) Modifying write data registers so that many cells of a AM can be written to in a single write cycle.
Besides minimizing test time, another way to reduce test cost is to maximize throughput. This can be accomplished by testing multiple memories in parallel.
What Makes Memory Test Different From Logic Test?
Memory test patterns are purely algorithmic. Each algorithm is designed to address a particular failure mechanism(s). This allows for the use of automatic test pattern generation, where the tester is given an algorithm and generates the actual I/O vectors at the pins. Logic, on the other hand, can not generally be tested in an algorithmic fashion, and the tests must be stored as I/O vectors. Memory test is usually done at a slower speed than logic test and therefore can use cheaper testers.
Testing the random logic in a processor core is orders of magnitude more difficult than testing memory - it is harder to write patterns for, to simulate, and to debug. Many more patterns are required compared to the relative small number of algorithms that can be used to test a memory. The time to test a processor relative to testing memory depends on the size of the memory. For smaller memories, it would take longer to test a processor, but for large enough memories, it would take less time. It is estimated that the crossover point may be around 4-16 Mb. The complexity of testing a processor can be reduced by using a full-scan design, in which all internal state is on a scan chain and accessible in a test mode. This would impact the performance relative to die size by roughly 5-10%
The four most expensive attributes of a tester, roughly in order, are:
Edge formatting/timing flexibility
Frequency
Memory depth
Accuracy
Logic testing has greater requirements than memory testing for #2 (because memory is typically slower than logic) and #3 (because memory testers can use algorithmic patterns) above. I'm not sure about #1 and #4.
Memory vendors are used to long test times and to test being a big factor in cost. As mentioned earlier, test times can vary considerably over the life of a part, and an order of magnitude reduction in test time is reasonable to expect. For current 16 Mb DRAMs, test times are on the order of 300 seconds. Microprocessor vendors are used to a few seconds of test time, and testing is less of a percentage of the total cost. A microprocessor may require a few seconds of test time. Memory testers test many parts in parallel, 32 to 64 for final test. Because microprocessors have many pins, parallel testing is impractical.
Testing integrated logic and memory (Embedded memory testing)
Both Synchronous DRAMs (SDRAM) and Cache DRAMs (CDRAM) have to deal with the mismatch between the speed of the logic to test and the limitations of a memory tester. Production memory testers can only test up to 60 MHz, yet a SDRAM operates up to 100 MHz and above. This is compensated by multiplexing 2 pins to 1. This reduces the available pin count in half and makes the test programming and test development more difficult. Pin multiplexing can also be used to test 100 MHz SDRAMs with two 50 MHz pattern generators. It is also points out that noise is more of a problem for high speed testing, and therefore a better quality of load board and load circuit (what interfaces between the device under test and the tester) is required.
IBMhas dealt with testing integrated logic and memory in a 1 MB L2 Cache. While observing that the addition of logic to memory did make the testing environment much more complex, the logic does not significantly increase the test time. It takes roughly 1 minute to test the DRAM and only about 1 second to test the logic. The logic has the capability for full-scan.
Dick Foss of MOSAID points out the importance of being able to independently test the memory array in any chip that combines logic and memory. It is necessary to be able to address each memory element from the pads for testing and for the identification of failing elements for the utilization of redundancy. An embedded DRAM for an ATM switch by MOSAIDprovides a test mode that allows direct access to the DRAM.
The strategy for testing RambusDRAMs is to minimize added test cost by co-existing within the standard manufacturing flow as much as possible. A special test mode is included in the Rambus interface that allows direct access to the core from the device pins. This allows existing equipment to test the core with RAS/CAS-like access. A PLL bypass allows the protocol logic to be functionally tested at wafer probe at low speed. The Rambus DRAM is therefore tested at probe with conventional memory testers. The only major change to the test flow is at final test. At final test, at-speed operation of the Rambus interface requires a tester with 100 ps accuracy and a 800 MHz vector rate.
All of the vendors who are currently producing RDRAMs have purchased HP83000/F660 testers specifically to do the interface test. This is not particularly desirable from the viewpoint of the DRAM manufacturer for a number of reasons. Having to do an additional socket insertion can reduce yield, because there is a risk every time a part is handled. Also, the HP83000/F660 is not particularly useful in a DRAM fab for testing devices other than RDRAMS. Still, the total added test cost is claimed to be only a few cents per device.
Mitsubishi has built a 32 bit multimedia microprocessor incorporated with a 16 Mb DRAM. The chip is tested in two stages. First, a test mode is utilized which allows for the reading and writing of the DRAM transparently from the outside. This is performed with a memory tester. Next, a logic tester is used to test the CPU itself and the communication between the CPU and the DRAM and between the CPU and the external interface. Test programs are loaded into the DRAM for this purpose.
One question with integrating memory and logic is whether the addition of logic to a DRAM process will require the purchasing of new hardware for testing, namely a logic tester to augment the usual memory tester. Mr. Foss feels that this is not the case, claiming that MOSAID memory testers, which are used by many memory vendors, can handle testing logic and memory. It should be noted that this somewhat hypothetical claim contradicts the actual experiences by both Rambus and Mitsubishi that are mentioned above.
A complex and high bandwidth interface between a chip and its external environment increases test cost due to the need for a high frequency, highly flexible tester. If the combination of a processor and memory would simplify the external interface complexity and bandwidth, that could help to reduce testing costs.
Foss presents an interesting idea regarding redundancy in a chip that incorporates logic and memory. If the logic is layed out to match the bit pitch of the memory array, then it is possible to incorporate redundancy of the logic in the same scheme that gives redundancy to the memory array with spare columns. This "pitch-matching" can both reduce area and increase yield. While this seems like a good idea for something like a SIMD architecture, I'm not sure how much this could be applied to a general purpose processor.
Use of Built-in Self-Test and Built-in Self-Repair
It may be possible to reduce test cost through the use of techniques such as Built-in Self-Test (BiST) and Built-in Self-Repair (BiSR). BiST refers to on-chip circuitry which automatically tests a memory array. This could be either just a pass/fail test, or it might also include the identification of failing elements for the subsequent blowing of fuses to utilize redundant rows and/or columns. BiSR extends BiST by performing a soft-repair: automatically utilizing the redundant elements as needed (if possible) within the memory array without the need to utilize an external machine (such as a laser). There is not universal agreement, however, on whether or not this is a recommended future direction in memory testing.
Due to the very regular structure of a RAM, most memory test algorithms use repetitive steps and do not require complex logic to implement in hardware. They could be implemented based on either random logic or microprogrammed control. An advantage of the microprogrammed based approach is that it is flexible and can be used to add additional tests and test algorithms to achieve better fault coverage with minimal additional area penalty. The are overhead for implementing BiST on very large RAMs is estimated to be less than 0.1%. Even if an algorithm is performed using BiST, the test time could still take a long time on a very large RAM. It is therefore recommended that some form of parallelism be used to reduce test time. There are advantages of implementing the test in BiST, even if it does take a while. Whatever is tested using BiST can proceed in parallel with other checks. It can also be done during the burn-in cycle of the RAM chip without the need for any expensive test equipment.
There are numerous papers describing implementations of BiST and BiSR on various generations of DRAMs from NEC.A motivation for the work is that BiST would allow for the simple, simultaneous testing of multiple memory chips, thereby reducing test costs. The BiST circuitry consists of a small ROM, some counters, generators, and a comparator. The ROM stores test procedures as micro-coded instructions implements a marching test and a scan WRITE/READ test with a checkerboard pattern in an 18 word x 10b ROM. The BiST circuits occupy 0.01% of the transistor count and 0.8% of the area in a 16 Mb DRAM. The access time loss in normal operation is negligible (but not necessarily zero) because "only" one (but again, not zero) extra gate is installed in the DRAM critical path. The claim is made that any more complicated test pattern could be stored by increasing the number of instruction steps in the ROM. A galloping test would approximate double the area for self-test circuits (requiring some additional logic, and not just more ROM). The ability for using multiple tests, possibly customized for each user, is compared to that of memory testers:
- More practical test patterns, which can detect errors caused by interference among memory cells, were not implemented.
- Retention tests were not included.
- The testing of the BiST circuits was not considered.
The following ways are proposed (but not actually implemented) to solve these problems:
- The previously presented tests were N-pattern tests, meaning that the number of operations required is proportional to N, the number of memory cells in the DRAM. A galloping test, which is an N^2-pattern test, is detailed. The claim is made that N^2-pattern tests can detect most DRAM faults, therefore it is the most severe test pattern that would be possible. So, showing that an N^2-pattern test is feasible means that any kind of test is possible.
- A data retention test is detailed, using the addition of a timer.
- Finally, a method for testing the BiST circuits themselves is proposed. This involves interrupting the BiST mode, modifying the data in the DRAM with a memory tester, and resuming BiST. The BiST be able to detect this artificially induced failure.
Experimental results are presented via Schmoo plots to show that the pass/fail status using a memory tester and using BiST are very close (but not identical).
Some observations can be made concerning the NECscheme:
- While the microprogrammed ROM gives initial appearance that it is general and easily expandable, in reality the different tests require different supporting circuitry, so adding another test is not necessarily as simple as just adding some more lines to the ROM.
- The proposed method of testing the BiST circuits means that the tester can not be removed entirely.
- It appears (but is not entire clear) that this test can only report pass/fail status, and not the location of faults. This information is needed for utilizing redundancy. It is not clear how difficult it would be to add this capability.
- No information is provided to attempt to quantify the overall impact of the BiST on test time and/or cost.
The previously reported BiST scheme is augmented by NEC with BiSR in a 64 Mb DRAM. During self-test, if a faulty cell is detected, the BiST circuits send an error signal to the BiSR circuits, which store the faulty cell address in the fail address memory (FAM). During later normal READ/WRITE operation, the BiSR always compares input addresses with stored addresses. If they match, the BiSR generates a signal which disconnects the core DRAM cells with the I/O buffers and connects the BiSR circuits to the I/O buffers, allowing the I/O data to be transferred between the spare memory in the BiSR circuits and the I/O buffers. Faulty cells are therefore automatically repaired.
The FAM and spare memory are implemented with SRAM cells, and they are placed at the center of the die. The total access time of the FAM and spare memory is shorter than that of the core DRAM array. Therefore, there is no access time overhead for BiSR operations.
In addition to the BiSR circuits, the DRAM has conventional row and column redundancy, which are used in fabrication. While there have been many papers from NEC discussing BiST and BiSR, it is not entirely clear whether this is still just a research idea or if it is used in any of their production DRAMs.
Mitsubishihas included BiST on a 1 Gb synchronous DRAM. The BiST performs timing margin tests using a PLL and AC timing comparators. Two stages of linear feedback shift registers (LFSRs) are used to compact the test results. At the end of a test, a final 16-bit output is compared against a known reference value. The use of BiST reduces the test time required on expensive testers. Also, high-speed AC tests can be performed on-chip without the need for high-accuracy DUT boards. The area penalty for the BiST is 0.8%. While the author makes the claim that BiST reduces the test cost, he would not quantify how much that reduction was. There is no mechanism to test that the BiST circuits are functioning properly.
Dick Foss does not feel that BiST is a viable solution to memory testing. He questions how to test the BiST logic if one is relying on BiST to test the memory array. Also, he points out that it is hard to replicate all of the types of tests that are performed during memory testing with BiST.
It is believed that BiST is useful for determining the use of redundant memory elements, but questions BiST on several accounts: Can BiST maintain testing reliability? How does one reliably test the BiST circuits? Is a high performance/high speed tester required to test the BiST?
It is believed that BiST will be able to reduce test time and test cost by allowing for the simultaneous testing of many memory chips in a system. The prediction is made that BiST will be widely used and make expensive testers unnecessary, and that the use of BiST is applicable to microprocessors and ASICs, and not just memory.
BiST will not necessarily reduce test time, since the same algorithm has to execute in either the on-chip BiST circuitry or in the tester's algorithmic pattern generator. So it is important to question whether the cost of the extra die area is paid for by the simpler tester that might result from the use of BiST.
While BiSR can be used during the test procedure, it would not be advisable, due to reliability and repeatability considerations, to use it to completely replace laser-repairable fuses as a means of implementing redundancy in a memory array. On the Digital Alpha 21164 microprocessor, BiST/BiSR is used during manufacturing test of the SRAM arrays in the instruction cache, but fuses are still blown to identify the use of redundant rows prior to shipping parts. There is no reason why implementing BiST/BiSR should be any more difficult on a DRAM than an SRAM. Although it is questioned whether it would be as beneficial in the high-volume production environment of RAM testing, which can integrate memory testers and laser zap machines, as it is in a logic testing environment.
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|