Memory errors and Their Detection
All
internal elements of a flash device are exposed to radiation, and can cause
different types of upsets or errors. The main errors that can occur in flash
memory devices are single event upsets, single event functional interrupts,
single event latch-up and bad block development. The resulting effects of these
errors are mentioned below;
Single Event Upsets (SEU):
This
can occur due to transient effects of radiation in semiconductor devices. If
left unprotected, these soft errors can corrupt the data in such magnitude that
the data becomes unusable.
Since
SEU's are soft errors, no permanent damage is done to the physical memory; the
main concern is corrupted data. Data can basically be protected from corruption
either by encoding the data or by storing multiple copies of it. The latter
option is unfeasible for a satellite memory system due to the size and power
restrictions that are common in satellite specifications. Therefore encoding
the data is the obvious choice. This entails taking the received data and
encoding it in a way that monitors the status of the data and if an error
occurs the code will detect the error and correct it if possible.
Single Event Functional Interrupt (SEFI):
Single
event functional interrupts are more complex than simple SEU's. SEFI's occur as
a result of transient errors occurring in the control circuitry in the flash
chip. This causes the flash chip to malfunction and become unpredictable. In
SEU's the error can be identified and corrected and in many cases the upset
cross-sections are representative of the geometrical areas of the sensitive
regions. However, in SEFI's, the event is caused by a SEU at a sensitive
section of a microcircuit device to which there is no direct access. Since the
exact location of the area cannot be identified, we can only observe the
failure of the device function. There are two main types of SEFI's that occur:
regular SEFI's and irregular SEFI's.
Single Event Latch-up (SEL):
Single
event latch-up is the most destructive of flash device errors. It is a
destructive condition that can destroy the device if current is not limited or
removed within allowable time. The operating flash current has been observed to
rise from the expected value, 20mA, to 430mA. The way to detect latch-up is to
monitor the current into the flash. The current levels must be distinguished
between SEFI's and latch-up; however both errors are treated similarly.
When
latch-up is detected the power to that chip must be cycled. When reset, the
power levels should return to normal. If the power levels continue to exceed
acceptable levels then power should be permanently removed from the device and
its failure reported.
Bad Blocks:
Invalid
blocks are defined as blocks that contain one or more invalid bits whose
reliability cannot be guaranteed. Invalid blocks have the same AC, DC
characteristics as valid blocks and do not affect the performance of valid
blocks. This is because they are separated from the bit line and common-source
line by a select transistor.
Bad
blocks (invalid blocks) can also develop over time due to radiation or overuse.
Using a bad block could result in invalid information being stored or read.
Therefore hardware measures should be taken into account to mitigate the
effects of bad blocks. If a block has become unreliable and marked as a bad
block then it cannot be recovered or used reliably again. Due to the finite
number of cycles which a block can be read/programmed/erased, after extensive
use more and more blocks will become invalid.
Comments
Post a Comment