摘要:
A processor includes a core, a hardware prefetcher, and a prefetcher control module. The hardware prefetcher includes logic to make speculative prefetch requests, through a memory subsystem, for elements for execution by the core, and logic to store prefetched elements in a cache. The prefetcher control module includes logic to selectively suppress, based on a hardware-prefetch suppression instruction executed by the core, a speculative prefetch request to be made by the hardware prefetcher.
摘要:
An apparatus and method for determining whether to execute an atomic operation locally or remotely. For example, one embodiment of a processor comprises: a decoder to decode an atomic operation on a local core; prediction logic on the local core to estimate a cost associated with execution of the atomic operation on the local core and a cost associated with execution of the atomic operation on a remote core; and the remote core to execute the atomic operation remotely if the prediction logic determines that the cost for execution on the local core is relatively greater than the cost for execution on the remote core; and the local core to execute the atomic operation locally if the prediction logic determines that the cost for local execution on the local core is relatively less than the cost for execution on the remote core.
摘要:
Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.
摘要:
A processing device comprises a processing device cache and a cache controller. The cache controller initiates a cache line eviction process and determines determine an object liveness value associated with a cache line in the processing device cache. The cache controller applies the object liveness value to a cache line eviction policy and evicts the cache line from the processing device cache based on the object liveness value and the cache line eviction policy.
摘要:
An apparatus and method for implementing a scratchpad memory within a cache using priority hints. For example, a method according to one embodiment comprises: providing a priority hint for a scratchpad memory implemented using a portion of a cache; determining a page replacement priority based on the priority hint; storing the page replacement priority in a page table entry (PTE) associated with the page; and using the page replacement priority to determine whether to evict one or more cache lines associated with the scratchpad memory from the cache.
摘要:
According to one embodiment, a processor includes an instruction decoder to decode an instruction to read a plurality of data elements from memory, the instruction having a first operand specifying a storage location, a second operand specifying a bitmask having one or more bits, each bit corresponding to one of the data elements, and a third operand specifying a memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the instruction, to read one or more data elements speculatively, based on the bitmask specified by the second operand, from a memory location based on the memory address indicated by the third operand, and to store the one or more data elements in the storage location indicated by the first operand.
摘要:
An apparatus and method are described for selecting elements to be used in a vector computation. For example, a method according to one embodiment includes the following operations: specifying whether to identify the first, last or next after last active element of an input mask register using an immediate value; identifying the first, last or next after last active element in the input mask register according to the immediate value; reading a value from an input vector register corresponding to the identified first, last or next after last active element in the input mask register; and writing the value to an output vector register.
摘要:
A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion.
摘要:
A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion.
摘要:
A method, medium, and system encoding/decoding video data using a binary arithmetic coding adaptive to a compression bit rate of the video data. The system may include a bitrate adaptation unit determining a maximum length of a prefix using a compression bitrate of the video data, a binarization unit dividing the video data into a prefix and a suffix according to the determined maximum length of the prefix and binarizing the video data, and an arithmetic encoding unit performing an arithmetic encoding on the binarized video data. The video data may be encoded/decoded using binary arithmetic encoding/decoding by determining the maximum length of the prefix, an order of an exponential Golomb code, and the number of contexts based on the compression bitrate. Accordingly, it is possible to obtain high encoding efficiency regardless of a range of the desired compression bitrate.