Abstract:
According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
Abstract:
An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
Abstract:
A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.
Abstract:
An apparatus and method are described for performing arbitrary logical operations specified by a table. For example, one embodiment of a method for performing a logical operation on a computer processor comprises: reading data from each of two or more source operands; combining the data read from the source operands to generate an index value, the index value identifying a subset of bits within an immediate value transmitted with an instruction; reading the bits from the immediate value; and storing the bits read from the immediate value within a destination register to generate a result of the instruction.
Abstract:
A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.
Abstract:
Instructions and logic provide vector linear interpolation functionality. In some embodiments, responsive to an instruction specifying: a first operand from a set of vector registers, a size of each of the vector elements, a portion of the vector elements upon which to compute linear interpolations, a second operand from a set of vector registers, and a third operand; an execution unit, reads a first, a second and a third value of the size of vector elements from corresponding data fields in the first, the second and the third operand respectively and computes an interpolated value as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value.
Abstract:
A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a second mask pattern in the mask register space. The method further includes fetching the first mask pattern from the mask register space; fetching the second mask pattern from the mask register space; merging the first and second mask patterns into a merged mask pattern; and, storing the merged mask pattern at a storage location identified by the destination address.
Abstract:
A method is described that includes fetching an instruction and decoding the instruction. The method further includes fetching a first mask vector from a first mask register space location identified by the instruction. The method further includes fetching a second mask vector from a second mask register space location identified by the instruction. The method also includes executing the instruction by merging the first and second mask vectors into a single data structure and causing the single data structure to be written into a memory location identified by the instruction.
Abstract:
A method and system may include a chip having graphics rendering hardware, a cache and a processor to execute an application with texture allocation logic to receive notification of a page miss from the graphics rendering hardware. The logic can map the page miss to a tile of a texture image, store the tile as an entry to the cache, and map the entry to a virtual address space of a virtual image corresponding to the texture image. The system may also include off-chip memory to store the texture image.