Abstract:
Aspects include computing devices, apparatus, and methods implemented by the apparatus for implementing a hybrid input/output (I/O) coherent write request on a computing device, including receiving an I/O coherent write request, generating a first hybrid I/O coherent write request and a second hybrid I/O coherent write request from the I/O coherent write request, sending the first hybrid I/O coherent write request and I/O coherent write data of the I/O coherent write request to a shared memory, and sending the second hybrid I/O coherent write request without the I/O coherent write data of the I/O coherent write request to a coherency domain.
Abstract:
Aspects include computing devices, apparatus, and methods implemented by the apparatus for implementing dynamic input/output (I/O) coherent workload processing on a computing device. Aspect methods may include offloading, by a processing device, a workload to a hardware accelerator for execution using an I/O coherent mode, detecting a dynamic trigger for switching from the I/O coherent mode to a non-I/O coherent mode while the workload is executed by the hardware accelerator, and switching from the I/O coherent mode to a non-I/O coherent mode while the workload is executed by the hardware accelerator.
Abstract:
Aspects include computing devices, apparatus, and methods for accelerating distributive virtual memory (DVM) message processing in a computing device. DVM message interceptors may be positioned in various locations within a DVM network of a computing device so that DVM messages may be intercepted before reaching certain DVM destinations. A DVM message interceptor may receive a broadcast DVM message from first DVM source. The DVM message interceptor may determine whether a preemptive DVM message response should be returned to the DVM source on behalf of the DVM destination. When certain criteria are met, the DVM message interceptor may generate a preemptive DVM message response to the broadcast DVM message, and send the preemptive DVM message response to the DVM source.
Abstract:
Aspects include computing devices, systems, and methods for implementing a cache memory access requests for data smaller than a cache line and eliminating overfetching from a main memory by combining the data with padding data of a size of a difference between a size of a cache line and the data. A processor may determine whether the data, uncompressed or compressed, is smaller than a cache line using a size of the data or a compression ratio of the data. The processor may generate the padding data using constant data values or a pattern of data values. The processor may send a write cache memory access request for the combined data to a cache memory controller, which may write the combined data to a cache memory. The cache memory controller may send a write memory access request to a memory controller, which may write the combined data to a memory.
Abstract:
Aspects include computing devices, apparatus, and methods for accelerating distributive virtual memory (DVM) message processing in a computing device. DVM message interceptors may be positioned in various locations within a DVM network of a computing device so that DVM messages may be intercepted before reaching certain DVM destinations. A DVM message interceptor may receive a broadcast DVM message from first DVM source. The DVM message interceptor may determine whether a preemptive DVM message response should be returned to the DVM source on behalf of the DVM destination. When certain criteria are met, the DVM message interceptor may generate a preemptive DVM message response to the broadcast DVM message, and send the preemptive DVM message response to the DVM source.
Abstract:
Systems and methods enable displaying a graphical representation of system resource usage in a resource utilization map to inform users about system resource utilization by applications and processes running on a computing device. Users may provide inputs to enable the system to adjust resource allocations based on user preferences. This may enable users to improve the overall operational performance of the device consistent with their current personal preferences by identifying applications or processes of most or least interest so the device processor to prioritize system resources accordingly. Some aspects transmit resource allocation data based on such user input to a central server to enable community based resource allocation schemes. Community based resource allocation schemes may be transmitted to computing devices for use as default or preliminary resource allocations for particular applications, websites or device operating states.
Abstract:
A method of interleaving a memory by mapping address bits of the memory to a number N of memory channels iteratively in successive rounds, wherein in each round except the last round: selecting a unique subset of address bits, determining a maximum number (L) of unique combinations possible based on the selected subset of address bits, mapping combinations to the N memory channels a maximum number of times (F) possible where each of the N memory channels gets mapped to an equal number of combinations, and if and when a number of combinations remain (K, which is less than N) that cannot be mapped, one to each of the N memory channels, entering a next round. In the last round, mapping remaining most significant address bits, not used in the subsets in prior rounds, to each of the N memory channels.
Abstract:
Various embodiments of methods and systems for managing write transaction volume from a master component to a long term memory component in a system on a chip (“SoC”) are disclosed. Because power consumption and bus bandwidth are unnecessarily consumed when ephemeral data is written back to long term memory (such as a double data rate “DDR” memory) from a closely coupled memory component (such as a low level cache “LLC” memory) of a data generating master component, embodiments of the solutions seek to identify write transactions that contain ephemeral data and prevent the ephemeral data from being written to DDR.
Abstract:
Systems and methods for uniformly interleaving memory accesses across physical channels of a memory space with a non-uniform storage capacity across the physical channels are disclosed. An interleaver is arranged in communication with one or more processors and a system memory. The interleaver identifies locations in a memory space supported by the memory channels and is responsive to logic that defines virtual sectors having a desired storage capacity. The interleaver accesses the asymmetric storage capacity uniformly across the virtual sectors in response to requests to access the memory space.
Abstract:
Removing invalid literal load values, and related circuits, methods, and computer-readable media are disclosed. In one aspect, an instruction processing circuit provides a literal load table containing one or more entries comprising an address and a cached literal load value. Upon detecting a literal load instruction in an instruction stream, the instruction processing circuit determines whether the literal load table contains an entry having an address of the literal load instruction. If so, the instruction processing circuit removes the literal load instruction from the instruction stream, and provides the cached literal load value stored in the entry to at least one dependent instruction. The instruction processing circuit further determines whether an invalidity indicator for the literal load table has been received. If so, the instruction processing circuit flushes the literal load table. The invalidity indicator may be generated responsive to modification of a constant table.