Abstract:
Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.
Abstract:
Systems, methods, and apparatuses relating to a configurable accelerator having dataflow execution circuits are described. In one embodiment, a hardware accelerator includes a plurality of dataflow execution circuits that each comprise a register file, a plurality of execution circuits, and a graph station circuit comprising a plurality of dataflow operation entries that each include a respective ready field that indicates when an input operand for a dataflow operation is available in the register file, and the graph station circuit is to select for execution a first dataflow operation entry when its input operands are available, and clear ready fields of the input operands in the first dataflow operation entry when a result of the execution is stored in the register file; a cross dependence network coupled between the plurality of dataflow execution circuits to send data between the plurality of dataflow execution circuits according to a second dataflow operation entry; and a memory execution interface coupled between the plurality of dataflow execution circuits and a cache bank to send data between the plurality of dataflow execution circuits and the cache bank according to a third dataflow operation entry.
Abstract:
Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.
Abstract:
Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.
Abstract:
Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.