Data communications in a distributed computing environment

    公开(公告)号:US10277547B2

    公开(公告)日:2019-04-30

    申请号:US14011375

    申请日:2013-08-27

    IPC分类号: G06F9/54 H04L12/58

    摘要: Data communications may be carried out in a distributed computing environment that includes a plurality of computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). In distributed computing environment, data communications may include: receiving in the AMI from an application an eager SEND instruction that describes the location and size of send data in an application SEND buffer; copying by the AMI the send data from the application SEND buffer to a temporary AMI buffer; advising the application of completion of the SEND instruction before sending the SEND data to the receiver; and after advising the application of completion of the SEND instruction, sending the SEND data by the sender to the receiver.

    Routing data communications packets in a parallel computer
    5.
    发明授权
    Routing data communications packets in a parallel computer 有权
    在并行计算机中路由数据通信数据包

    公开(公告)号:US09569399B2

    公开(公告)日:2017-02-14

    申请号:US13668503

    申请日:2012-11-05

    摘要: Routing data communications packets in a parallel computer that includes compute nodes organized for collective operations. Each compute node including an operating system kernel and a system-level messaging module that is a module of automated computing machinery that exposes a messaging interface to applications. Each compute node including a routing table that specifies, for each of a multiplicity of route identifiers, a data communications path through the compute node. Including to carry out the steps of: receiving in a compute node a data communications packet that includes a route identifier value; retrieving from the routing table a specification of a data communications path through the compute node; and routing, by the compute node, the data communications packet according to the data communications path identified by the compute node's routing table entry for the data communications packet's route identifier value.

    摘要翻译: 在包括为集体操作组织的计算节点的并行计算机中路由数据通信数据包。 每个计算节点包括操作系统内核和系统级消息传递模块,该模块是将应用程序的消息传递接口公开的自动化计算机的模块。 每个计算节点包括路由表,其为多个路由标识符中的每一个指定通过计算节点的数据通信路径。 包括以下步骤:在计算节点中接收包括路由标识符值的数据通信分组; 从所述路由表中检索通过所述计算节点的数据通信路径的规范; 并且由计算节点根据数据通信分组的路由标识符值的计算节点的路由表项标识的数据通信路径路由数据通信分组。

    Data communications in a distributed computing environment
    6.
    发明授权
    Data communications in a distributed computing environment 有权
    分布式计算环境中的数据通信

    公开(公告)号:US09544261B2

    公开(公告)日:2017-01-10

    申请号:US14011158

    申请日:2013-08-27

    摘要: Data communications may be carried out in a distributed computing environment that includes a plurality of computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). In such an environment, data communications may include: issuing, by a sender to a receiver, an eager SEND data communications instruction to transfer SEND data, the instruction including information describing a location and size of a send buffer in which the SEND data is stored; transmitting, by the sender to the receiver, the SEND data as eager data packets; issuing, by the receiver to the sender in dependence upon data flow conditions, a STOP instruction, the STOP instruction including an order to stop transmitting the eager data packets; and transferring the SEND data by the receiver from the sender's data location to a receive buffer by remote direct memory access (“RDMA”).

    摘要翻译: 数据通信可以在分布式计算环境中执行,该分布式计算环境包括通过通信适配器和活动消息接口(“AMI”)耦合用于数据通信的多个计算机。 在这样的环境中,数据通信可以包括:由发送者向接收者发送急切的SEND数据通信指令以传送SEND数据,该指令包括描述存储有SEND数据的发送缓冲器的位置和大小的信息 ; 由发送方向接收方发送SEND数据作为渴望数据包; 接收器根据数据流条件向发送方发出STOP指令,STOP指令包括停止发送渴望数据包的命令; 并且通过远程直接存储器访问(“RDMA”)将接收器的SEND数据从发送者的数据位置传送到接收缓冲器。

    Administering virtual machines in a distributed computing environment
    7.
    发明授权
    Administering virtual machines in a distributed computing environment 有权
    在分布式计算环境中管理虚拟机

    公开(公告)号:US09503514B2

    公开(公告)日:2016-11-22

    申请号:US14260710

    申请日:2014-04-24

    IPC分类号: G06F9/455 H04L29/08 H04L29/06

    摘要: In a distributed computing environment that includes hosts which each execute a VMM, with each VMM supporting execution of one or more VMs, administering a the VMs may include: assigning, by a VMM manager, the VMMs of the distributed computing environment to a logical tree topology, including assigning one of the VMMs as a root VMM of the tree topology; and executing, amongst the VMMs of the tree topology, an allgather operation, including: sending, by the root VMM, to other VMMs in the tree topology, a request to retrieve VMs supported by the other VMMs; pausing, by each of the other VMMs, a VM supported by the VMM; providing, by each of the other VMMs as a response to the root VMM's request, the paused VM; and broadcasting, by the root VM to the other VMMs as a set of VMs, the received VMs.

    摘要翻译: 在包括每个执行VMM的主机的分布式计算环境中,每个VMM支持一个或多个VM的执行,管理VM可以包括:由VMM管理器将分布式计算环境的VMM分配给逻辑树 拓扑,包括将一个VMM分配为树形拓扑的根VMM; 并且在所述树形拓扑的所述VMM之中执行一次全面的操作,所述操作包括:由所述根VMM发送到所述树形拓扑中的其他VMM,检索由所述其他VMM支持的VM的请求; 由每个其他VMM暂停由VMM支持的虚拟机; 由每个其他VMM提供作为对根VMM请求的响应,暂停的VM; 并且由根VM向作为一组VM的其他VMM广播所接收的VM。

    Administering incomplete data communications messages in a parallel computer

    公开(公告)号:US09250987B2

    公开(公告)日:2016-02-02

    申请号:US14269744

    申请日:2014-05-05

    IPC分类号: G06F9/54

    CPC分类号: G06F9/546 G06F15/17306

    摘要: Administering incomplete data communications messages in a parallel computer that includes a plurality of compute nodes, with each compute node including a processor and a messaging accelerator, includes: transmitting, by a source messaging accelerator to a destination messaging accelerator, a message, including processing a messaging descriptor describing the message and setting, in the message descriptor, a flag indicating the message has been sent; transmitting, by the source messaging accelerator to a destination messaging accelerator responsive to processing an acknowledgement request descriptor corresponding to the message, a request for acknowledgment of receipt of the message; receiving, by the source messaging accelerator from the destination messaging accelerator, a negative acknowledgment (NACK) indicating that the message was not received at the destination messaging accelerator; and clearing, by the source messaging accelerator in the message descriptor, the flag indicating that message has been sent.

    Collective operation protocol selection in a parallel computer
    9.
    发明授权
    Collective operation protocol selection in a parallel computer 有权
    并行计算机中的集体操作协议选择

    公开(公告)号:US09047091B2

    公开(公告)日:2015-06-02

    申请号:US13683702

    申请日:2012-11-21

    摘要: Collective operation protocol selection in a parallel computer that includes compute nodes may be carried out by calling a collective operation with operating parameters; selecting a protocol for executing the operation and executing the operation with the selected protocol. Selecting a protocol includes: iteratively, until a prospective protocol meets predetermined performance criteria: providing, to a protocol performance function for the prospective protocol, the operating parameters; determining whether the prospective protocol meets predefined performance criteria by evaluating a predefined performance fit equation, calculating a measure of performance of the protocol for the operating parameters; determining that the prospective protocol meets predetermined performance criteria and selecting the protocol for executing the operation only if the calculated measure of performance is greater than a predefined minimum performance threshold.

    摘要翻译: 包括计算节点的并行计算机中的集体操作协议选择可以通过调用具有操作参数的集合操作来执行; 选择用于执行操作的协议并且使用所选择的协议来执行操作。 选择协议包括:迭代地,直到预期协议满足预定的性能标准:向前瞻协议的协议性能函数提供操作参数; 通过评估预定义的性能拟合方程来确定所述预期协议是否满足预定义的性能标准,计算所述协议对于所述运行参数的性能的度量; 确定预期协议满足预定性能标准,并且仅当所计算的性能测量值大于预定义的最小性能阈值时才选择用于执行操作的协议。

    Data Communications In A Distributed Computing Environment
    10.
    发明申请
    Data Communications In A Distributed Computing Environment 审中-公开
    分布式计算环境中的数据通信

    公开(公告)号:US20150067068A1

    公开(公告)日:2015-03-05

    申请号:US14011375

    申请日:2013-08-27

    IPC分类号: H04L12/58

    CPC分类号: H04L51/18 G06F9/546 H04L51/34

    摘要: Data communications may be carried out in a distributed computing environment that includes a plurality of computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). In distributed computing environment, data communications may include: receiving in the AMI from an application an eager SEND instruction that describes the location and size of send data in an application SEND buffer; copying by the AMI the send data from the application SEND buffer to a temporary AMI buffer; advising the application of completion of the SEND instruction before sending the SEND data to the receiver; and after advising the application of completion of the SEND instruction, sending the SEND data by the sender to the receiver.

    摘要翻译: 数据通信可以在分布式计算环境中执行,该分布式计算环境包括通过通信适配器和活动消息接口(“AMI”)耦合用于数据通信的多个计算机。 在分布式计算环境中,数据通信可以包括:从应用程序在AMI中接收描述应用SEND缓冲器中发送数据的位置和大小的急切SEND指令; 由AMI将应用程序SEND缓冲区的发送数据复制到临时AMI缓冲区; 在将SEND数据发送到接收器之前,建议应用SEND指令的完成; 并且在建议SEND指令的完成的建议之后,发送者将SEND数据发送给接收者。