摘要:
A synchronization optimized queuing method and device to minimize software/hardware interaction in network interface hardware during an end-of-initiative process, including network adapter queue implementations for network interface hardware for optimized communication in a computer system. An end-of-initiative procedure to ensure that the network interface hardware has received an interrupt enable and to recheck the interrupt queue is unnecessary in the present invention.
摘要:
A system for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.
摘要:
A system and structure for snooping cache memories of several snooping masters connected to a bus macro, wherein each non-originating snooping master has a cache memory, and wherein some, but less than all the cache memories, may have the data requested by an originating snooping master and wherein the needed data in an non-originating snooping master is marked as updated, and wherein a main memory having addresses for all data is connected to the bus macro. Only those non-originating snooping masters which may have the requested data are queried. All the non-originating snooping masters that have been queried reply. If a non-originating snooping master has the requested data marked as updated, that non-originating snooping master returns the updated data to the originating snooping master and possibly to the main memory. If none of the non-originating snooping masters has the requested data marked as updated, then the requested data is read from main memory.
摘要:
An improved method, device and data processing system are presented. In one embodiment, the method includes a source device sending a request for a bus grant to deliver data to a data bus connecting a source device and a destination device. The device receives the bus grant and logic within the device determines whether the bandwidth of the data bus allocated to the bus grant will be filled by the data. If the bandwidth of the data bus allocated to the bus grant will not be filled by the data, the device appends additional data to the first data and delivers the combined data to the data bus during the bus grant for the first data. When the bandwidth of the data bus allocated to the bus grant will be filled by the first data, the device delivers only the first data to the data bus during the bus grant.
摘要:
A method for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.
摘要:
A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
摘要:
A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
摘要:
Mechanisms for performing dynamic request routing based on broadcast depth queue information are provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide queue depth information to each of the other processor chips in the system. The queue depth information identifies a number of requests or amount of data in each of the queues of a processor chip that originated the heartbeat signal. The queue depth information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.
摘要:
A mechanism is provided for collective acceleration unit tree flow control forms a logical tree (sub-network) among those processors and transfers “collective” packets on this tree. The system supports many collective trees, and each collective acceleration unit (CAU) includes resources to support a subset of the trees. Each CAU has limited buffer space, and the connection between two CAUs is not completely reliable. Therefore, to address the challenge of collective packets traversing on the tree without colliding with each other for buffer space and guaranteeing the end-to-end packet delivery, each CAU in the system effectively flow controls the packets, detects packet loss, and retransmits lost packets.
摘要:
A mechanism for performing dynamic request routing based on broadcast source request information is provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide source request information to each of the other processor chips in the system. The source request information identifies the number of active source requests sent by the processor chip that originated the heartbeat signal. The source request information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.