摘要:
Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
摘要:
Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.
摘要:
A parallel bit reversal device and method. The device includes a parallel bit reversal unit, a butterfly computation and control unit, and a memory. The butterfly computation and control unit is coupled to the memory via a data bus. The parallel bit reversal unit is configured to bit-reverse butterfly group data used by the butterfly computation and control unit. The parallel bit reversal unit includes an address reversing logic coupled to the butterfly computation and control unit, and configured to perform mirror reversal and right-shift operations on a read address from the butterfly computation and control unit.
摘要:
An area efficient data shifter/rotator using a barrel shifter. The invention is a circuit, which uses a single barrel shifter and is controllable to implement either a left or right shift or rotation of bits of a digital data word. The circuit is dynamically controllable to implement left or right shift of bits of the digital data word (both logical and arithmetic) and rotation (to the left or right) of bits of the word. The proposed circuit produces the required output in a single cycle.
摘要:
A system, apparatus and a method for routing data over fewer switches and interconnections among reconfigurable logic elements, and for adapting routing resources to dynamically perform complex bit-level permutations, such as shifting and bit reversal operations. In one embodiment, an exemplary silo routing circuit is formed upon a semiconductor substrate and routes data among a number of reconfigurable computational elements. The silo routing circuit comprises a plurality of input terminals and a plurality of output terminals. Further, the silo routing circuit includes a multi-stage interconnection network (“MIN”) of switches configurable to form data paths from any input terminal to any output terminal.
摘要:
A method of providing wiring efficiency in a permute unit. Multiple selectors receive input data and shared control signals from multiple register files. The permute unit includes multiple multiplexors (MUXs) coupled to multiple logical AND gates. The multiple logical AND gates are coupled to multiple logical OR gates. The logical AND gates are physically separated from the logical OR gates. The logical AND gates receive input from one or more output data signals from the selectors. The logical OR gates combine the one or more output signals from the logical AND gates and provide output data from the permute unit.
摘要:
A system, apparatus and a method for routing data over fewer switches and interconnections among reconfigurable logic elements, and for adapting routing resources to dynamically perform complex bit-level permutations, such as shifting and bit reversal operations. In one embodiment, an exemplary silo routing circuit is formed upon a semiconductor substrate and routes data among a number of reconfigurable computational elements. The silo routing circuit comprises a plurality of input terminals and a plurality of output terminals. Further, the silo routing circuit includes a multi-stage interconnection network (“MIN”) of switches configurable to form data paths from any input terminal to any output terminal.
摘要:
A processing engine, such as a digital signal processor, includes an execution mechanism, a repeat count register and a repeat count index register. The execution mechanism is operable for a repeat instruction to initialize the repeat count index register with the content of the repeat count register, and to modify the content of the repeat count register. The repeat instruction comprises two parts, the first of which initializes the repeat count index register and initiates repeat of a subsequent instruction, and the second part of which modifies the content of the repeat count register.
摘要:
A data processing apparatus, method and computer program product. The apparatus comprising: a register data store comprising at least three general purpose registers each operable to store a plurality of data elements; an instruction decoder operable to decode a multiplex instruction; a data processor operable to process said plurality of data elements in parallel, said data processor being controlled by said instruction decoder; and in response to said decoded multiplex instruction, said data processor being operable to specify: two of said at least three general-purpose registers as source registers, each operable to store a plurality of source data elements; a further one of said at least three registers as a control register operable to store a plurality of control values; and one of said control, or said two source registers as a destination register operable to store a plurality of resultant data elements; wherein in response to each of said plurality of control values said data processor is operable to select a corresponding data element from one of said two source registers, and to store said corresponding data element as a resultant data element in said destination register.
摘要:
A method and apparatus for performing a shift operation on a packed data element having multiple values. The apparatus having multiple muxes, each of the multiple muxes having a first input, a second input, a select input and an output. Each of the multiple bits that represent a shifted packed intermediate result on a first bus is coupled to the corresponding first input. Each of the multiple bits representing a replacement bit for one of the multiple values is coupled to a corresponding second input. Each of the multiple bits driven by a correction circuit is coupled to a corresponding select input. Each output corresponds to a bit of a shifted packed result.