Memory errors and Their Detection

All internal elements of a flash device are exposed to radiation, and can cause different types of upsets or errors. The main errors that can occur in flash memory devices are single event upsets, single event functional interrupts, single event latch-up and bad block development. The resulting effects of these errors are mentioned below;

Single Event Upsets (SEU):

This can occur due to transient effects of radiation in semiconductor devices. If left unprotected, these soft errors can corrupt the data in such magnitude that the data becomes unusable.
Since SEU's are soft errors, no permanent damage is done to the physical memory; the main concern is corrupted data. Data can basically be protected from corruption either by encoding the data or by storing multiple copies of it. The latter option is unfeasible for a satellite memory system due to the size and power restrictions that are common in satellite specifications. Therefore encoding the data is the obvious choice. This entails taking the received data and encoding it in a way that monitors the status of the data and if an error occurs the code will detect the error and correct it if possible.

Single Event Functional Interrupt (SEFI):

Single event functional interrupts are more complex than simple SEU's. SEFI's occur as a result of transient errors occurring in the control circuitry in the flash chip. This causes the flash chip to malfunction and become unpredictable. In SEU's the error can be identified and corrected and in many cases the upset cross-sections are representative of the geometrical areas of the sensitive regions. However, in SEFI's, the event is caused by a SEU at a sensitive section of a microcircuit device to which there is no direct access. Since the exact location of the area cannot be identified, we can only observe the failure of the device function. There are two main types of SEFI's that occur: regular SEFI's and irregular SEFI's.

Single Event Latch-up (SEL):

Single event latch-up is the most destructive of flash device errors. It is a destructive condition that can destroy the device if current is not limited or removed within allowable time. The operating flash current has been observed to rise from the expected value, 20mA, to 430mA. The way to detect latch-up is to monitor the current into the flash. The current levels must be distinguished between SEFI's and latch-up; however both errors are treated similarly.
When latch-up is detected the power to that chip must be cycled. When reset, the power levels should return to normal. If the power levels continue to exceed acceptable levels then power should be permanently removed from the device and its failure reported.

Bad Blocks:

Invalid blocks are defined as blocks that contain one or more invalid bits whose reliability cannot be guaranteed. Invalid blocks have the same AC, DC characteristics as valid blocks and do not affect the performance of valid blocks. This is because they are separated from the bit line and common-source line by a select transistor.
Bad blocks (invalid blocks) can also develop over time due to radiation or overuse. Using a bad block could result in invalid information being stored or read. Therefore hardware measures should be taken into account to mitigate the effects of bad blocks. If a block has become unreliable and marked as a bad block then it cannot be recovered or used reliably again. Due to the finite number of cycles which a block can be read/programmed/erased, after extensive use more and more blocks will become invalid.


Comments