摘要:
Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switch if the criteria for a thread context switch are satisfied, including executing the next thread of execution.
摘要:
Data communications may be carried out in a distributed computing environment that includes computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). Such data communications may be carried out by: issuing, by a sender to a receiver, an eager SEND data communications instruction to transfer SEND data, the instruction including information describing data location at the sender and data size; transmitting, by the sender to the receiver, the SEND data as eager data packets; discarding, by the receiver in dependence upon data flow conditions, eager data packets as they are received from the sender; and transferring, in dependence upon the data flow conditions, by the receiver from the sender's data location to a receive buffer by remote direct memory access (“RDMA”), the SEND data.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.
摘要:
Executing a gather operation on a parallel computer that includes a plurality of compute nodes, including: dividing, by each task in an operational group of tasks, a send buffer containing contribution data into a plurality of chunks of data, each chunk of data located at an offset within the send buffer; sending, by each task in the operational group of tasks, one chunk of data to a root task through a data communications thread for each chunk of data; receiving the chunks of data by the root task; and storing, by the root task, each chunk of data in a receive buffer of the root task in dependence upon the offset of each chunk of data within the send buffer.
摘要:
Deterministic message processing in a direct memory access (DMA) adapter includes the DMA adapter incrementing from a sub-head pointer, the sub-tail pointer until encountering an out-of-sequence packet. The DMA adapter also consumes packets between the sub-head pointer and the sub-tail pointer including incrementing with the consumption of each packet, the sub-head pointer until determining that the sub-head pointer is equal to the sub-tail pointer. In response to determining that the sub-head pointer is equal to the sub-tail pointer, the DMA adapter determines that the next in-sequence packet is not in the first FIFO message queue. In response to determining that the next in-sequence packet is not in the first FIFO message queue and that the first FIFO message queue exceeds a threshold capacity, the DMA controller copies the contents of the first FIFO message queue into the second FIFO message queue.
摘要:
Data communications may be carried out in a distributed computing environment that includes a plurality of computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). In such an environment, data communications may include: issuing, by a sender to a receiver, an eager SEND data communications instruction to transfer SEND data, the instruction including information describing a location and size of a send buffer in which the SEND data is stored; transmitting, by the sender to the receiver, the SEND data as eager data packets; issuing, by the receiver to the sender in dependence upon data flow conditions, a STOP instruction, the STOP instruction including an order to stop transmitting the eager data packets; and transferring the SEND data by the receiver from the sender's data location to a receive buffer by remote direct memory access (“RDMA”).
摘要:
Data communications may be carried out in a distributed computing environment that includes computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). Such data communications may be carried out by: issuing, by a sender to a receiver, an eager SEND data communications instruction to transfer SEND data, the instruction including information describing data location at the sender and data size; transmitting, by the sender to the receiver, the SEND data as eager data packets; discarding, by the receiver in dependence upon data flow conditions, eager data packets as they are received from the sender; and transferring, in dependence upon the data flow conditions, by the receiver from the sender's data location to a receive buffer by remote direct memory access (“RDMA”), the SEND data.
摘要:
Data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a SEND instruction, the SEND instruction specifying a transmission of transfer data from the origin endpoint to a first target endpoint; transmitting from the origin endpoint to the first target endpoint a Request-To-Send (‘RTS’) message advising the first target endpoint of the location and size of the transfer data; assigning by the first target endpoint to each of a plurality of target endpoints separate portions of the transfer data; and receiving by the plurality of target endpoints the transfer data.
摘要:
Completion processing of data communications instructions in a distributed computing environment, including receiving, in an active messaging interface (‘AMI’) data communications instructions, at least one instruction specifying a callback function; injecting into an injection FIFO buffer of a data communication adapter, an injection descriptor, each slot in the injection FIFO buffer having a corresponding slot in a pending callback list; listing in the pending callback list any callback function specified by an instruction, incrementing a pending callback counter for each listed callback function; transferring payload data as per each injection descriptor, incrementing a transfer counter upon completion of each transfer; determining from counter values whether the pending callback list presently includes callback functions whose data transfers have been completed; calling by the AMI any such callback functions from the pending callback list, decrementing the pending callback counter for each callback function called.
摘要:
Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.