Abstract:
A processing system employs techniques for enhancing dynamic random access memory (DRAM) page retirement to facilitate identification and retirement of pages affected by multi-page DRAM faults. In response to detecting an uncorrectable error at a first page of DRAM, the processing system identifies a second page of the DRAM for potential retirement based on one or more of physical proximity to the first page, inclusion in a range of addresses stored at a fault map that tracks addresses of DRAM pages having detected faults, and predicting a set of pages to check for faults based on misses at a translation lookaside buffer (TLB).
Abstract:
A memory module includes one or more programmable ECC engines that may be programed by a host processing element with a particular ECC implementation. As used herein, the term “ECC implementation” refers to ECC functionality for performing error detection and subsequent processing, for example using the results of the error detection to perform error correction and to encode corrupted data that cannot be corrected, etc. The approach allows an SoC designer or company to program and reprogram ECC engines in memory modules in a secure manner without having to disclose the particular ECC implementations used by the ECC engines to memory vendors or third parties.
Abstract:
Error handling for resilient software includes: receiving data indicating a region of resilient memory; detecting an error associated with a region of memory; and preventing raising an exception for the error in response to the region of memory falling within the region of resilient memory by preventing the region of memory as being identified as including the error.
Abstract:
A system is described that performs training operations for a neural network, the system including an analog circuit element functional block with an array of analog circuit elements, and a controller. The controller monitors error values computed using an output from each of one or more initial iterations of a neural network training operation, the one or more initial iterations being performed using neural network data acquired from the memory. When one or more error values are less than a threshold, the controller uses the neural network data from the memory to configure the analog circuit element functional block to perform remaining iterations of the neural network training operation. The controller then causes the analog circuit element functional block to perform the remaining iterations.
Abstract:
Systems, apparatuses, and methods for routing traffic through vertically stacked memory are disclosed. A computing system includes a host processor die and multiple vertically stacked memory dies. The host processor die generates memory access requests for the data stored in the multiple memory array banks in the memory dies. At least one memory die uses an on-die network switch with a programmable routing table for routing packets corresponding to the generated memory requests. Routes use both vertical hops and horizontal hops to reach the target memory array bank and to avoid any congested or failed resources along the route. The vertically stacked memory dies use through silicon via interconnects and at least one via does not traverse through all of the memory dies. Accordingly, the host processor die does not have a direct connection to one or more of the multiple memory dies.
Abstract:
A method, a system, and a non-transitory computer readable medium for generating application code to be executed on an active storage device are presented. The parts of an application that can be executed on the active storage device are determined. The parts of the application that will not be executed on the active storage device are converted into code to be executed on a host device. The parts of the application that will be executed on the active storage device are converted into code of an instruction set architecture of a processor in the active storage device.
Abstract:
A processing system includes a processing device coupled to a memory configured to check for and correct faults in requested data. In response to correcting the faults of the requested data, the memory sends the corrected data and unused check bits to the processing device as a plurality of fetch returns. The memory also sends a parity fetch based on the corrected data and one or more operations to the processing device. After receiving the plurality of fetch returns and the unused check bits, the processing device checks each fetch return for faults based on the unused check bits. In response to determining that a fetch return includes a fault, the processing device erases the fetch return and reconstructs the fetch return based on one or more other received fetch returns and the parity fetch.
Abstract:
Selecting an error correction code type for a memory device includes: selecting, by the memory device in dependence upon predefined selection criteria, one of a plurality of error correction code types and carrying out memory access requests utilizing the selected error correction code type.
Abstract:
Error handling for resilient software includes: receiving data indicating a region of resilient memory; detecting an error associated with a region of memory; and preventing raising an exception for the error in response to the region of memory falling within the region of resilient memory by preventing the region of memory as being identified as including the error.
Abstract:
A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.