摘要:
A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel. Dual transfer execution units within each transfer controller, together with independent transfer counters, are employed to allow decoupling of source and destination address generation and to allow multiple transfer instructions in one transfer execution unit to operate in parallel with a single transfer instruction in the other transfer unit. Improved flow control of data between a source and destination is provided through the use of special semaphore operations, signals and message synchronization which may be invoked explicitly using SIGNAL and WAIT type instructions or implicitly through the use of special “event-action” registers. Transfer controllers are also described which can cooperate to perform “DMA-to-DMA” transfers. Message-level synchronization can be used by transfer controllers to synchronize with each other.
摘要:
A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel. Dual transfer execution units within each transfer controller, together with independent transfer counters, are employed to allow decoupling of source and destination address generation and to allow multiple transfer instructions in one transfer execution unit to operate in parallel with a single transfer instruction in the other transfer unit. Improved flow control of data between a source and destination is provided through the use of special semaphore operations, signals and message synchronization which may be invoked explicitly using SIGNAL and WAIT type instructions or implicitly through the use of special “event-action” registers. Transfer controllers are also described which can cooperate to perform “DMA-to-DMA” transfers. Message-level synchronization can be used by transfer controllers to synchronize with each other.
摘要:
A scalable reconfigurable register file (SRRF) containing multiple register files, read and write multiplexer complexes, and a control unit operating in response to instructions is described. Multiple address configurations of the register files are supported by each instruction and different configurations are operable simultaneously during a single instruction execution. For example, with separate files of the size 32×32 supported configurations of 128×32 bit s, 64x64 bit s and 32×128 bit s can be in operation each cycle. Single width, double width, quad width operands are optimally supported without increasing the register file size and without increasing the number of register file read or write ports.
摘要:
A system core having an internal memory which transfers data from an external device to the internal memory is described. To this end, the system core includes a processor, a direct memory access (DMA) controller, an instruction memory and a plurality of memories. The instruction memory contains processor instructions and DMA instructions. The DMA controller fetches DMA instructions from the instruction memory. The DMA controller executes the fetched DMA instructions and thus populates the plurality of memories with data from the external device. The processor then operates on the data found in the populated memories.
摘要:
Low power architecture features and techniques are provided in a scalable array indirect VLIW processor. These features and techniques include power control of a reconfigurable register file, conditional power control of multi-cycle operations and indirect VLIW utilization, and power control of VLIW-based vector processing using the ManArray register file indexing mechanism. These techniques are applicable to all processing elements (PEs) and the array controller sequence processor (SP) to provide substantial power savings.
摘要:
Techniques for adding more complex instructions and their attendant multi-cycle execution units with a single instruction multiple data stream (SIMD) very long instruction word (VLIW) processing framework are described. In one aspect, an initiation mechanism also acts as a resynchronization mechanism to read the results of multi-cycle execution. This multi-purpose mechanism operates with a short instruction word (SIW) issue of the multi-cycle instruction, in a sequence processor (SP) alone, with a VLIW, and across all processing elements (PEs) individually or as an array of PEs. A number of advantageous floating point instructions are also described.
摘要:
A reconfigurable register file system is described. The reconfigurable register file system includes an instruction register for storing an instruction specifying an operational requirement, a reconfigurable register file comprising an odd register file having at least one data read port, and an even register file having at least one data read port. The reconfigurable register file system may further suitably include an execution unit connected to the data read ports of the odd and even register files and port usage control logic connected to the instruction register and the reconfigurable register file to control the odd register file and the even register file port address input so that data read port lines change only as needed to support the operational requirement specified by the instruction. The port usage control logic may further include a gating circuit connected to the reconfigurable register files and a clock input, the gating circuit being operable for gating the clock off so no change of state of the reconfigurable register files occurs for each cycle when change is not necessary and gating the clock on so new data is clocked into the reconfigurable register files for each cycle when change is desired.
摘要:
Details of a highly cost effective and efficient implementation of a manifold array (ManArray) architecture and instruction syntax for use therewith are described herein. Various aspects of this approach include the regularity of the syntax, the relative ease with which the instruction set can be represented in database form, the ready ability with which tools can be created, the ready generation of self-checking codes and parameterized testcases. Parameterizations can be fairly easily mapped and system maintenance is significantly simplified.
摘要:
A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel. Dual transfer execution units within each transfer controller, together with independent transfer counters, are employed to allow decoupling of source and destination address generation and to allow multiple transfer instructions in one transfer execution unit to operate in parallel with a single transfer instruction in the other transfer unit. Improved flow control of data between a source and destination is provided through the use of special semaphore operations, signals and message synchronization which may be invoked explicitly using SIGNAL and WAIT type instructions or implicitly through the use of special “event-action” registers. Transfer controllers are also described which can cooperate to perform “DMA-to-DMA” transfers. Message-level synchronization can be used by transfer controllers to synchronize with each other.
摘要:
Port priorities are defined on a 32-bit word, 16-bit half-word, and 8-bit byte basis to control the write enable signals to a compute register file (CRF). With a manifold array (ManArray) reconfigurable register file, it is possible to have double-word 64-bit and single word 32-bit data-type instructions mixed with other double-word, single-word, half-word, or byte data-type instructions within the same very long instruction word (VLIW). By resolving a write priority conflict on the byte, half-word, or word that is in conflict during the VLIW execution, it is possible to have partial operations complete that provide a useful function. For. example, a load half-word to the half-word H0 portion of a 32-bit register R0 can have priority to complete its operation while a 64-bit shift of the register pair R0 and R1 will complete its operation on the non-conflicting half-word portions of the 64-bit register R0 and R1. Other unique capabilities result from the present approach to assigning port priorities that improve the performance of the ManArray indirect VLIW processor.