Friday, August 12, 2016
Flash memory is being drawn into the mainstream of enterprise storage, but its tendency to deteriorate with use remains an Achilles' heel. A paper released at the Aug. 9 start of the Flash Memory Summit in Santa Clara, Calif., finds that machine learning can counteract that deterioration and drastically extend its life cycle.
The paper was written by Tom Coughlin, president of Coughlin Associates (PDF), a solid state consultant in Atascadero, Calif. He is also general chairman of the summit. The paper was sponsored by NVMdurance, a Limerick, Ireland, firm that is applying machine learning in the software it creates for managing solid state devices.
Using machine learning to prolong the useful life of high-capacity SSD systems is a new field.
The fact of that use of flash memory cells results in their physical deterioration as holders of electronic charges (which get translated into digital bits) can't be reversed. However, Coughlin argues that machine learning can understand the pattern of how the solid state device is being used and rejigger registers and voltages to maximize device longevity.
With the complexity and scale of today's SSDs, "the task becomes impossible to do manually," Coughlin wrote in his introduction to the machine learning concept.
Flash memory works by storing a charge on a floating gate, which can be described as a charge trap. To load the trap, a known voltage level is needed to push electrons through a layer of insulation that allows the cell to hold the charge after the current is taken away.
A key characteristic of flash is that less voltage is needed to load the trap when the memory cell is new. The voltage used may range between 7 and 12 Volts, Coughlin noted. Use of the cell tends to degrade the insulation layer, "making it harder to keep electrons on the floating gate," he continued. Higher voltages are needed as the cell ages, but they result in more degradation of the insulation.
"As electrons leak off the gate over time, this changes the voltage on the floating gate and also leads to bit errors," Coughlin reports. Knowing the rate of leakage becomes a way to predict how long the data in the cell will remain intact. The more frequently the cell is programmed and erased, the weaker the insulation layer becomes, and the life of the device as a whole is gradually shortened.
The process of electrons tunneling out of a charged cell through the insulation and into a neighboring cell is what is known as signal to noise ratio (SNR). The SNR must be kept in check for the flash device to know its data is intact and can be read accurately. Device makers invest heavily in error correction codes that can overcome the noise levels and confirm accurate data is being transferred.
The issue affects NAND devices being widely used today, Coughlin wrote.
"NAND cells are susceptible to bit errors," and NAND device makers add parity bits that tell the controller -- a built-in flash device microprocessor -- what the data being retrieved should look like. An error detection and correction engine on the controller can then verify the bits or recover small amounts of leakage-disrupted data.
The controller relies on registers that tell it which sequences of data were stored in what sectors they occupy on the memory chip. A single-level flash may have 50 registers -- a complex set to manage but still within reach of a human programmer.
But today flash is being manufactured with multi-level cells that contain four distinct voltage charges in each cell, or even three-level cells, leading to the need for a 1,000 registers of critical location information, a number likely to be beyond the grasp of a single programmer.
All the issues of flash operation are exacerbated by the growing complexity of the memory chip. Multi-level cells have a lower tolerance for signal to noise ratios; the more levels, the lower the tolerance, Coughlin wrote.
"As the levels of charge on the floating gates becomes smaller and smaller, the impact of a few electrons migrating from one floating gate to another becomes more and more significant."
By applying machine learning to the characteristics of the memory chip and the patterns in which the data is being stored, a model can be built of how the chip is functioning and how it might function in a revised pattern that could extend its life.
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|