Abstract:
Systems and methods to accelerate compression and decompression with a search unit implemented in the processor core. According to an embodiment, a search unit may be implemented to perform compression or decompression on an input stream of data. The search unit may use a look-up table to identify appropriate compression or decompression symbols. The look-up table may be populated with a table derived using the variable length coding symbols of a sequence of vertices to be compressed or extracted from a received data stream to be decompressed. A comparator and a finite state machine may be implemented in the search unit to facilitate traversal of the look-up table.
Abstract:
A texture unit may be used to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to the texture unit. The texture unit may use linear interpolators in order to perform the dot product calculations.
Abstract:
Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.
Abstract:
A texture unit may be used to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to the texture unit. The texture unit may use linear interpolators in order to perform the dot product calculations.
Abstract:
Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.
Abstract:
Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.
Abstract:
In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.
Abstract:
Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.
Abstract:
Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.
Abstract:
Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.