摘要:
A message-passing protocol for accommodating early arrival messages passed between source and destination nodes in a computer system with a plurality of asynchronous computing nodes interconnected by bidirectional asynchronous communications channels. The protocol includes transmitting the message from sender to receiver without waiting for a request for the message from the receiver; determining at the receiver if a receive buffer has been posted for the message; and if the receive buffer has not been posted for the message, then either truncating the message by storing its message header in an early arrival queue at the receiver and discarding its data or allocating a temporary receive buffer at the receiver to hold the message data. Upon the receiver being ready to post a receive buffer for an early arrival message, the receiver checks the early arrival queue for the corresponding message header, and if the message header is in the early arrival queue and the message data has been discarded, then the receiver sends a pull request to the sender to retransmit the message to the receiver.
摘要:
A parallel array processor for massively parallel applications is formed with low power CMOS with DRAM processing while incorporating processing elements on a single chip. Eight processors on a single chip have their own associated processing element, significant memory, and I/O and are interconnected with a hypercube based, but modified, topology. These nodes are then interconnected, either by a hypercube, modified hypercube, or ring, or ring within ring network topology. Conventional microprocessor MMPs consume pins and time going to memory. The new architecture merges processor and memory with multiple PMEs (eight 16 bit processors with 32K and I/O) in DRAM and has no memory access delays and uses all the pins for networking. The chip can be a single node of a fine-grained parallel processor. Each chip will have eight 16 bit processors, each processor providing 5 MIPs performance. I/O has three internal ports and one external port shared by the plural processors on the chip. Significant software flexibility is provided to enable quick implementation of existing programs written in common languages. It is a developable and expandable technology without need to develop new pinouts, new software, or new utilities as chip density increases and new hardware is provided for a chip function. The scalable chip PME has internal and external connections for broadcast and asynchronous SIMD, MIMD and SIMIMD (SIMD/MIMD) with dynamic switching of modes. The chip can be used in systems which employ 32, 64 or 128,000 processors, and can be used for lower, intermediate and higher ranges. Local and global memory functions can all be provided by the chips themselves, and the system can connect to and support other global memories and DASD. The chip can be used as a microprocessor accelerator, in personal computer applications, as a vision or avionics computer system, or as workstation or supercomputer. There is program compatibility for the fully scalable system.
摘要:
In order to determine a transfer path of a message to a receiving-end processor group, a processor includes a routing bit generation circuit, and an exchange switch includes partial broadcast path control circuits and a path control information alteration circuit. In order to define the range of a receiving-end processor group, a network includes transfer control circuits. A crossbar switch includes transfer control circuits associated with output ports and a boundary register group. When a partial broadcast message is transferred from an input port in the downstream direction of an output port, it is decided whether a belonging to the partial broadcast range associated with a connected to the particular input port is connected to the particular output port, whereby the particular partial broadcast message is transferred from the same output port.
摘要:
A computer system having a plurality of processors and memory including a plurality of scalable nodes having multiple like processor memory elements. Each of the processor memory elements has a plurality of communication paths for communication within a node to other like processor memory elements within the node. Each of the processor memory elements also has a communication path for communication external to the node to another like scalable node of the computer system.
摘要:
A data transfer control device comprising an instruction decoding unit for receiving an transfer instruction from an arithmetic processing unit provided in the cluster and decoding the content thereof, some instruction storage units for storing the transfer instruction, a shared memory access unit for reading and writing the data through access to the shared memory provided in the cluster, a data transfer unit for delivering the data read out by the shared memory access unit to the network among clusters, as well as delivering the received data through the network among clusters to the shared memory access unit, and a transfer control unit for controlling the shared memory access control unit and the data transfer unit according to the transfer instruction which is read out from the instruction storage unit, wherein the instruction decoding unit classifies the transfer instruction into an urgent transfer instruction or a non-urgent transfer instruction on the basis of the decoded result thereof, so to store it into one of the instruction storage unit separately, and the transfer control unit reads out the transfer instruction preferentially from the instruction storage unit which stores the urgent transfer instruction.
摘要:
A computer system comprises a plurality of processing elements interconnected by a communications network. The communications network has a series of network addresses each identifying a location in the network, each processing element has an associated network address in the series. The communications network transfers messages transmitted by the processing elements in accordance with a respective address portion associated with each message, thereby to transfer the messages among the processing elements. Each processing element includes a message generator and a message transmitter. The message generator generates, during a message transfer operation, a series of messages for transmission over the communications network to others of the processing elements in the system, each message having an address portion whose contents enable the communications network to transfer the message from the processing element generating the message to a processing element to receive the message. The message generator generates the messages such that address portions of the successive messages in the series are transferred to processing elements having successive network addresses. The message transmitter iteratively transmits the series of messages generated by the message generator, the message transmitters of the processing elements which have successive network addresses selecting, as initial messages to be transmitted, messages having address portions to be transmitted to processing elements which have succeeding network addresses, thereby to effect a skewing of the messages transferred over the communications network.
摘要:
The computer system has its parallel and serial implementations, its serial and parallel network and multi-processor configurations, with tight and loose coupling among processors. The computer system has a CAM coupled to the computer system or imbedded therein. CAM requests may be processed serially, or as parallel queries and coupled with PAPS (Parallel Associative Processor System) capabilities (P-CAM). The computer system may be configured as an expert system preferably having combined tuple space (TS) and CAM (content addressable memory) resources, an inference engine and a knowledge base. As an expert system, improvements for production processing are provided which surpass prior ad performance represented by RETE and CLIPS. An inferencing process for production systems is disclosed, and a process for working memory element assertions. The computer system is provided with a language construct which is language independent in the form of a sub-set paradigm having three basic operators and three basic extensions. The basic primitive sub-set paradigm including OUT(); IN() and READ(). Extensions of said basic sub-set are Sample(); SampleList(); and ReadList(). These primitives may be used with LINDA, and with various compilers. EVAL of LINDA is not used but instead the sub-set paradigm is used with CAM for tuple space operations in data base applications. The language construct paradigm is use to envelope and control CAM operations.
摘要:
A computer system, and its parallel and serial implementations, its serial and parallel network and multi-processor configurations, with tight and loose coupling among processors. The computer system has a CAM coupled to the computer system or imbedded therein. CAM requests may be processed serially, or as parallel queries and coupled with PAPS (Parallel Associative Processor System) capabilities (P-CAM). The computer system may be configured as an expert system preferably having combined tuple space (TS) and CAM (content addressable memory) resources, an inference engine and a knowledge base. As an expert system, improvements for production processing are provided which surpass prior art performance represented by RETE and CLIPS. An inferencing process for production systems is disclosed, and a process for working memory element assertions. The computer system is provided with a language construct which is language independent in the form of a sub-set paradigm having three basic operators and three basic extensions. The basic primitive sub-set paradigm including OUT(); IN() and READ(). Extensions of said basic sub-set are Sample(); SampleList(); and ReadList(). These primitives may be used with LINDA, and with various compilers. EVAL of LINDA is not used but instead the sub-set paradigm is used with CAM for tuple space operations in data base applications. The language construct paradigm is use to envelope and control CAM operations.
摘要:
A computer system having a plurality of processors and memory including a plurality of scalable nodes having multiple like processor memory elements. Each of the processor memory elements has a plurality of communication paths for communication within a node to other like processor memory elements within the node. Each of the processor memory elements also has a communication path for communication external to the node to another like scalable node of the computer system.
摘要:
A data processing system having a plurality of processing units (C1, C2), a plurality of memory units (M1, M2) and a communication system providing communication between the processing units and the memory units. The processing units each have a plurality of register sets (R1, R2) allowing them to run a plurality of processes. When a process requires data from memory, which it receives over the communication system, its respective processing unit processes another of its processes until that requires data. Data is transmitted over the communication system, which may be configured as a grid, in the form of packets. The grid is configured from routing devices which include first-in-first-out devices for the buffering of packets. The system facilitates the construction of circuits integrated onto a singel wafter of semiconducting material. Furthermore the grid structure may also be employed as a local area network and computers having a similar architecture may be connected to the network providing a processing facility of considerable power.