Abstract:
Various aspects include methods for implementing retaining high locality data in a higher level cache memory on a computing device. Various aspects may include receiving a cache access request for a first cache line in the higher level cache memory indicating a locality of the first cache line, determining whether the access request indicates high locality, and setting a high locality indicator of the first cache line in response to determining that the cache access request indicates high locality. Various aspects may include determining whether a lower level cache memory hit counter of a first cache line of a first cache exceeds a lower level cache locality threshold, setting a high locality indicator of the first cache line in response to determining that the lower level cache memory hit counter exceeds the lower level cache locality threshold and resetting the lower level cache memory hit counter of the first cache.
Abstract:
Systems and methods pertain to a multiprocessor system comprising disunited cache structures. A first private-information cache is coupled to a first processor of the multiprocessor system. The first private-information cache is configured to store information that is private to the first processor. A first shared-information cache which is disunited from the first private-information cache is also coupled to the first processor. The first shared-information cache is configured to store information that is shared/shareable between the first processor and one or more other processors of the multiprocessor system.
Abstract:
Ephemeral data stored in a cache is read when needed but is not written to system memory so as to save power and bandwidth. In an embodiment, a no-writeback bit associated with the ephemeral data is set in response to a read-no-writeback instruction. Data in a cache line for which its no-writeback bit has been set is not written back into system memory. Accordingly, when evicting cache lines, if a cache line has a no-writeback bit set, then the data in that cache line is discarded without being written back to system memory.
Abstract:
A device includes a memory configured to store video data. The device also includes a video decoder coupled to the memory and to a cache. The video decoder is configured to decode an input frame of the video data to generate a first video frame and includes an inline downscaler configured to generate a second video frame corresponding to the first video frame downscaled for display output.
Abstract:
The energy consumed by data transfer in a computing device may be reduced by transferring data that has been encoded in a manner that reduces the number of one “1” data values, the number of signal level transitions, or both. A data destination component of the computing device may receive data encoded in such a manner from a data source component of the computing device over a data communication interconnect, such as an off-chip interconnect. The data may be encoded using minimum Hamming weight encoding, which reduces the number of one “1” data values. The received data may be decoded using minimum Hamming weight decoding. For other computing devices, the data may be encoded using maximum Hamming weight encoding, which increases the number of one “1” data values while reducing the number of zero “0” values, if reducing the number of zero values reduces energy consumption.
Abstract:
In one aspect, space in a tile-unaware cache associated with an address aperture may be managed in different ways depending on whether a processing component initiating an access request through the aperture to a tile-based memory is tile-unaware or tile-aware. Upon a full-tile read by a tile-aware process, data may be evicted from the cache, or space may not be allocated. Upon a full-tile write by a tile-aware process, data may be evicted from the cache. In another aspect, a tile-unaware process may be supplemented with tile-aware features by generating a full tile of addresses in response to a partial-tile access. Upon a partial-tile read by the tile-unaware process, the generated addresses may be used to pre-fetch data. Upon a partial-tile write, the addresses may be used to evict data. Upon a bit block transfer, the addresses may be used in dividing the bit block transfer into units of tiles.
Abstract:
Aspects presented herein relate to methods and devices for data or graphics processing including an apparatus, e.g., a graphics processing unit (GPU). The apparatus may obtain an indication of a data write for data associated with data processing. The apparatus may write, based on the indication, the data associated with the data processing to a memory address. The apparatus may receive a read request for the data including the memory address, where the read request is associated with a read with invalidate process. The apparatus may retrieve, based on the read request, the data from at least one of a first cache or at least one second memory, where the retrieval of the data is based on a timing of the indication of the data write. The apparatus may output the retrieved data from at least one of the first cache or the at least one second memory.
Abstract:
In one aspect, space in a tile-unaware cache associated with an address aperture may be managed in different ways depending on whether a processing component initiating an access request through the aperture to a tile-based memory is tile-unaware or tile-aware. Upon a full-tile read by a tile-aware process, data may be evicted from the cache, or space may not be allocated. Upon a full-tile write by a tile-aware process, data may be evicted from the cache. In another aspect, a tile-unaware process may be supplemented with tile-aware features by generating a full tile of addresses in response to a partial-tile access. Upon a partial-tile read by the tile-unaware process, the generated addresses may be used to pre-fetch data. Upon a partial-tile write, the addresses may be used to evict data. Upon a bit block transfer, the addresses may be used in dividing the bit block transfer into units of tiles.
Abstract:
Systems and methods relate to managing shared resources in a multithreaded processor comprising two or more processing threads. Danger levels for the two or more threads are determined, wherein the danger level of a thread is based on a potential failure of the thread to meet a deadline due to unavailability of a shared resource. Priority levels associated with the two or more threads are also determined, wherein the priority level is higher for a thread whose failure to meet a deadline is unacceptable and the priority level is lower for a thread whose failure to meet a deadline is acceptable. The two or more threads are scheduled based at least on the determined danger levels for the two or more threads and priority levels associated with the two or more threads.
Abstract:
Aspects include computing devices, systems, and methods for implementing a cache memory access requests for compressed data using cache bank spreading. In an aspect, cache bank spreading may include determining whether the compressed data of the cache memory access fits on a single cache bank. In response to determining that the compressed data fits on a single cache bank, a cache bank spreading value may be calculated to replace/reinstate bank selection bits of the physical address for a cache memory of the cache memory access request that may be cleared during data compression. A cache bank spreading address in the physical space of the cache memory may include the physical address of the cache memory access request plus the reinstated bank selection bits. The cache bank spreading address may be used to read compressed data from or write compressed data to the cache memory device.