Abstract:
Aspects include computing devices, systems, and methods for implementing a cache memory access requests for data smaller than a cache line and eliminating overfetching from a main memory by combining the data with padding data of a size of a difference between a size of a cache line and the data. A processor may determine whether the data, uncompressed or compressed, is smaller than a cache line using a size of the data or a compression ratio of the data. The processor may generate the padding data using constant data values or a pattern of data values. The processor may send a write cache memory access request for the combined data to a cache memory controller, which may write the combined data to a cache memory. The cache memory controller may send a write memory access request to a memory controller, which may write the combined data to a memory.
Abstract:
Aspects include computing devices, systems, and methods for implementing a cache memory access requests for data smaller than a cache line and eliminating overfetching from a main memory by writing supplemental data to the unfilled portions of the cache line. A cache memory controller may receive a cache memory access request with a supplemental write command for data smaller than a cache line. The cache memory controller may write supplemental to the portions of the cache line not filled by the data in response to a write cache memory access request or a cache miss during a read cache memory access request. In the event of a cache miss, the cache memory controller may retrieve the data from the main memory, excluding any overfetch data, and write the data and the supplemental data to the cache line. Eliminating overfetching reduces bandwidth and power required to retrieved data from main memory.
Abstract:
Methods, devices, and non-transitory process-readable storage media for compacting data within cache lines of a cache. An aspect method may include identifying, by a processor of the computing device, a base address (e.g., a physical or virtual cache address) for a first data segment, identifying a data size (e.g., based on a compression ratio) for the first data segment, obtaining a base offset based on the identified data size and the base address of the first data segment, and calculating an offset address by offsetting the base address with the obtained base offset, wherein the calculated offset address is associated with a second data segment. In some aspects, the method may include identifying a parity value for the first data segment based on the base address and obtaining the base offset by performing a lookup on a stored table using the identified data size and identified parity value.
Abstract:
A method of controlling power within a multicore central processing unit (CPU) is disclosed. The method may include monitoring a die temperature, determining a degree of parallelism within a workload of the CPU, and powering one or more cores of the CPU up or down based on the degree of parallelism, the die temperature, or a combination thereof.
Abstract:
Methods, systems and devices that include a dynamic clock and voltage scaling (DCVS) solution configured to compute and enforce performance guarantees for a group of processors to ensure that the processors does not remain in a busy state (e.g., due to transient workloads) for a combined period that is more than a predetermined amount of time above that which is required for one of the processors to complete its pre-computed steady state workload. The DCVS may adjust the frequency and/or voltage of one or more of the processors based on a variable delay to ensure that the multiprocessor system only falls behind its steady state workload by, at most, a predefined maximum amount of work, irrespective of the operating frequency or voltage of the processors.
Abstract:
A processor-based system employing a safety island architecture for fail-safe operation and related methods are disclosed. The processor-based system includes a main domain for controlling a device. The main domain receives and processes vehicle information from a vehicle network. The main domain communicates with vehicle modules to control operation of the vehicle. Such operation may include different autonomous driving use cases. The processing system includes a safety island domain that includes less hardware circuits as in the main domain. The safety island domain is configured to checkpoint vehicle information processed by both the main and safety island domains and to monitor errors originating in both the main and safety island.
Abstract:
In one aspect, space in a tile-unaware cache associated with an address aperture may be managed in different ways depending on whether a processing component initiating an access request through the aperture to a tile-based memory is tile-unaware or tile-aware. Upon a full-tile read by a tile-aware process, data may be evicted from the cache, or space may not be allocated. Upon a full-tile write by a tile-aware process, data may be evicted from the cache. In another aspect, a tile-unaware process may be supplemented with tile-aware features by generating a full tile of addresses in response to a partial-tile access. Upon a partial-tile read by the tile-unaware process, the generated addresses may be used to pre-fetch data. Upon a partial-tile write, the addresses may be used to evict data. Upon a bit block transfer, the addresses may be used in dividing the bit block transfer into units of tiles.
Abstract:
An intelligent tile-based prefetching solution executed by a compression address aperture services linearly addressed data requests from a processor to memory stored in a memory component having a tile-based address structure. The aperture monitors tile reads and seeks to match the tile read pattern to a predefined pattern. If a match is determined, the aperture executes a prefetching algorithm uniquely and optimally associated with the predefined tile read pattern. In this way, tile overfetch is mitigated while the latency on first line data reads is reduced.
Abstract:
Methods and systems are disclosed for full-hardware management of power and clock domains related to a distributed virtual memory (DVM) network. An aspect includes transmitting, from a DVM initiator to a DVM network, a DVM operation, broadcasting, by the DVM network to a plurality of DVM targets, the DVM operation, and, based on the DVM operation being broadcasted to the plurality of DVM targets by the DVM network, performing one or more hardware optimizations comprising: turning on a clock domain coupled to the DVM network or a DVM target of the plurality of DVM targets that is a target of the DVM operation, increasing a frequency of the clock domain, turning on a power domain coupled to the DVM target based on the power domain being turned off, or terminating the DVM operation to the DVM target based on the DVM target being turned off.
Abstract:
Methods, devices, and non-transitory process-readable storage media for compacting data within cache lines of a cache. An aspect method may include identifying, by a processor of the computing device, a base address (e.g., a physical or virtual cache address) for a first data segment, identifying a data size (e.g., based on a compression ratio) for the first data segment, obtaining a base offset based on the identified data size and the base address of the first data segment, and calculating an offset address by offsetting the base address with the obtained base offset, wherein the calculated offset address is associated with a second data segment. In some aspects, the method may include identifying a parity value for the first data segment based on the base address and obtaining the base offset by performing a lookup on a stored table using the identified data size and identified parity value.