Abstract:
To efficiently transfer of data from a cache to a memory, it is desirable that more data corresponding to the same page in the memory be loaded in a line buffer. Writing data to a memory page that is not currently loaded in a row buffer requires closing an old page and opening a new page. Both operations consume energy and clock cycles and potentially delay more critical memory read requests. Hence it is desirable to have more than one write going to the same DRAM page to amortize the cost of opening and closing DRAM pages. A desirable approach is batch write backs to the same DRAM page by retaining modified blocks in the cache until a sufficient number of modified blocks belonging to the same memory page are ready for write backs.
Abstract:
The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism performs a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the resource is not available for performing the operation and until another resource is selected for performing the operation, the selection mechanism identifies a next resource in the table and selects the next resource for performing the operation when the next resource is available for performing the operation.
Abstract:
Embodiments include methods, systems, and computer readable medium directed to cache bypassing based on prefetch streams. A first cache receives a memory access request. The request references data in the memory. The data comprises non-reuse data. After a determination of a miss in the first cache, the first cache forwards the memory access request to a cache control logic. The detection of the non-reuse data instructs the cache control logic to allocate a block only in a second cache and bypass allocating a block in the first cache. The first cache is closer to the memory than the second cache.
Abstract:
The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation.
Abstract:
A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases.
Abstract:
A processor discards spill data from a memory hierarchy in response to the final access to the spill data has been performed by a compiled program executing at the processor. In some embodiments, the final access determined based on a special-purpose load instruction configured for this purpose. In some embodiments the determination is made based on the location of a stack pointer indicating that a method of the executing program has returned, so that data of the returned method that remains in the stack frame is no longer to be accessed. Because the spill data is discarded after the final access, it is not transferred through the memory hierarchy.
Abstract:
A method of partitioning a data cache comprising a plurality of sets, the plurality of sets comprising a plurality of ways, is provided. Responsive to a stack data request, the method stores a cache line associated with the stack data in one of a plurality of designated ways of the data cache, wherein the plurality of designated ways is configured to store all requested stack data.
Abstract:
A technique for processing qubits in a quantum computing device is provided. The technique includes determining that, in a first cycle, a first quantum processing region is to perform a first quantum operation that does not use a qubit that is stored in the first quantum processing region, identifying a second quantum processing region that is to perform a second quantum operation at a second cycle that is later than the first cycle, wherein the second quantum operation uses the qubit, determining that between the first cycle and the second cycle, no quantum operations are performed in the second quantum processing region, and moving the qubit from the first quantum processing region to the second quantum processing region.
Abstract:
A dynamically configurable overprovisioned microprocessor optimally supports a variety of different compute application workloads and with the capability to tradeoff among compute performance, energy consumption, and clock frequency on a per-compute application basis, using general-purpose microprocessor designs. In some embodiments, the overprovisioned microprocessor comprises a physical compute resource and a dynamic configuration logic configured to: detect an activation-warranting operating condition; undarken the physical compute resource responsive to detecting the activation-warranting operating condition; detect a configuration-warranting operating condition; and configure the overprovisioned microprocessor to use the undarkened physical compute resource responsive to detecting the configuration-warranting operating condition.
Abstract:
An approach is provided for implementing register based single instruction, multiple data (SIMD) lookup table operations. According to the approach, an instruction set architecture (ISA) can support one or more SIMD instructions that enable vectors or multiple values in source data registers to be processed in parallel using a lookup table or truth table stored in one or more function registers. The SIMD instructions can be flexibly configured to support functions with inputs and outputs of various sizes and data formats. Various approaches are also described for supporting very large lookup tables that span multiple registers.