Abstract:
In response to a processor core exiting a low-power state, a cache is set to a minimum size so that fewer than all of the cache's entries are available to store data, thus reducing the cache's power consumption. Over time, the size of the cache can be increased to account for heightened processor activity, thus ensuring that processing efficiency is not significantly impacted by a reduced cache size. In some embodiments, the cache size is increased based on a measured processor performance metric, such as an eviction rate of the cache. In some embodiments, the cache size is increased at regular intervals until a maximum size is reached.
Abstract:
A system and method for efficient predicting and processing of memory access dependencies. A computing system includes control logic that marks a detected load instruction as a first type responsive to predicting the load instruction has high locality and is a candidate for store-to-load (STL) data forwarding. The control logic marks the detected load instruction as a second type responsive to predicting the load instruction has low locality and is not a candidate for STL data forwarding. The control logic processes a load instruction marked as the first type as if the load instruction is dependent on an older store operation. The control logic processes a load instruction marked as the second type as if the load instruction is independent on any older store operation.
Abstract:
In some embodiments, a method of managing cache memory includes identifying a group of cache lines in a cache memory, based on a correlation between the cache lines. The method also includes tracking evictions of cache lines in the group from the cache memory and, in response to a determination that a criterion regarding eviction of cache lines in the group from the cache memory is satisfied, selecting one or more (e.g., all) remaining cache lines in the group for eviction.
Abstract:
A device for disabling faulty cores using proxy virtual machines includes a processor, a faulty core, and a physical memory. The processor is responsible for executing a hypervisor that is configured to assign a proxy virtual machine to the faulty core. The assigned proxy virtual machine also includes a minimal workload. Various other methods, systems, and computer-readable media are also disclosed.
Abstract:
Power gating decisions can be made based on measures of cache dirtiness. Analyzer logic can selectively power gate a component of a processor system based on a cache dirtiness of one or more caches associated with the component. The analyzer logic may power gate the component when the cache dirtiness exceeds a threshold and may maintains the component in an idle state when the cache dirtiness does not exceed the threshold. Idle time prediction logic may be used to predict a duration of an idle time of the component. The analyzer logic may then selectively power gates the component based on the cache dirtiness and the predicted idle time.
Abstract:
To efficiently transfer of data from a cache to a memory, it is desirable that more data corresponding to the same page in the memory be loaded in a line buffer. Writing data to a memory page that is not currently loaded in a row buffer requires closing an old page and opening a new page. Both operations consume energy and clock cycles and potentially delay more critical memory read requests. Hence it is desirable to have more than one write going to the same DRAM page to amortize the cost of opening and closing DRAM pages. A desirable approach is batch write backs to the same DRAM page by retaining modified blocks in the cache until a sufficient number of modified blocks belonging to the same memory page are ready for write backs.
Abstract:
A circuit includes a plurality of synchronizers to adapt a signal from a first clock domain to a second clock domain. Each synchronizer of the plurality of synchronizers includes a synchronizer input to receive the signal from the first clock domain and a synchronizer output to provide the signal as adapted to the second clock domain. The circuit also includes a multiplexer (mux) that includes a plurality of mux inputs and a mux output. Each mux input is coupled to the synchronizer output of a respective synchronizer of the plurality of synchronizers. The mux output provides the signal, as adapted to the second clock domain, from the synchronizer output of a selected synchronizer of the plurality of synchronizers.
Abstract:
A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases.
Abstract:
A processor discards spill data from a memory hierarchy in response to the final access to the spill data has been performed by a compiled program executing at the processor. In some embodiments, the final access determined based on a special-purpose load instruction configured for this purpose. In some embodiments the determination is made based on the location of a stack pointer indicating that a method of the executing program has returned, so that data of the returned method that remains in the stack frame is no longer to be accessed. Because the spill data is discarded after the final access, it is not transferred through the memory hierarchy.
Abstract:
The described embodiments include a computing device with a first entity and a second entity. In the computing device, a management controller dynamically sets a power-state limit for the first entity based on a performance coupling and a thermal coupling between the first entity and the second entity.