Abstract:
Systems, methods, and computer programs are disclosed for kernel masking dynamic random access memory (DRAM) defects. One such method comprises: detecting and correcting a single-bit error associated with a physical address in a dynamic random access memory (DRAM); receiving error data associated with the physical address from the DRAM; storing the received error data in a failed address table located in a non-volatile memory; and retiring a kernel page corresponding to the physical address if a number of errors associated with the physical address exceeds an error count threshold.
Abstract:
Systems, methods, and computer programs are disclosed for recovering from dynamic random access memory (DRAM) defects. One method comprises determining that an uncorrected bit error has occurred for a physical codeword address associated with a dynamic random access memory (DRAM) device coupled to a system on chip (SoC). A kernel page associated with a DRAM page comprising the physical codeword address is identified as a bad page. Recovery from the uncorrected bit error is provided by rebooting a system comprising the SoC and the DRAM device. In response to the rebooting, the identified kernel page is excluded from being allocated for DRAM operation.