Recoverable error detection for concurrent computing programs
    1.
    发明授权
    Recoverable error detection for concurrent computing programs 有权
    并发计算程序的可恢复错误检测

    公开(公告)号:US07925791B2

    公开(公告)日:2011-04-12

    申请号:US11488432

    申请日:2006-07-17

    IPC分类号: G06F15/16

    摘要: The present invention provides a system and method for detecting communication error among multiple nodes in a concurrent computing environment. A barrier synchronization point or regions are used to check for communication mismatch. The barrier synchronization can be placed anywhere in a concurrent computing program. If a communication error occurred before the barrier synchronization point, it would at least be detected when a node enters the barrier synchronization point. Once a node has reached the barrier synchronization point, it is not allowed to communicate with another node regarding data that is needed to execute the concurrent computing program, even if the other node has not reached the barrier synchronization point. Regions can also be used to detect a communication mismatch instead of barrier synchronization points. A concurrent program on each node is separated into one or more regions. Two nodes can only communicate with each other when their regions are compatible. If their regions are not compatible, then there is a communication mismatch.

    摘要翻译: 本发明提供了一种用于在并发计算环境中检测多个节点之间的通信错误的系统和方法。 屏障同步点或区域用于检查通信不匹配。 屏障同步可以放置在并发计算程序中的任何位置。 如果在屏障同步点之前发生通信错误,则当节点进入屏障同步点时,至少会被检测到。 一旦节点达到屏障同步点,即使其他节点尚未到达屏障同步点,也不允许与其他节点通信,以执行并发计算程序所需的数据。 区域也可用于检测通信不匹配而不是屏障同步点。 每个节点上的并发程序分为一个或多个区域。 当两个节点的区域兼容时,两个节点只能相互通信。 如果他们的区域不兼容,那么通信不匹配。

    Recoverable error detection for concurrent computing programs
    5.
    发明授权
    Recoverable error detection for concurrent computing programs 有权
    并发计算程序的可恢复错误检测

    公开(公告)号:US08055940B2

    公开(公告)日:2011-11-08

    申请号:US11879383

    申请日:2007-07-17

    IPC分类号: G06F11/00

    摘要: A system and method detects communication error among multiple nodes in a concurrent computing environment. One or more barrier synchronization points/checkpoints or regions are used to check for a communication mismatch. The barrier synchronization point(s)/checkpoint(s) can be placed anywhere in the concurrent computing program. Once a node reaches a barrier synchronization point/checkpoint, it is not allowed to communicate with another node regarding data that is needed to execute the concurrent computing program, even if the other node has not reached the barrier synchronization point/checkpoint. Regions can also, or alternatively, be used to detect a communication mismatch instead of barrier synchronization points/checkpoints. A concurrent program on each node is separated into one or more regions. Two nodes communicate with each other when their regions are compatible. If their regions are not compatible, a communication mismatch occurs.

    摘要翻译: 系统和方法在并发计算环境中检测多个节点之间的通信错误。 使用一个或多个屏障同步点/检查点或区域来检查通信不匹配。 屏障同步点/检查点可以放置在并发计算程序中的任何位置。 一旦节点到达屏障同步点/检查点,即使其他节点尚未到达屏障同步点/检查点,也不允许与其他节点通信,以执行并发计算程序所需的数据。 区域也可以或替代地用于检测通信失配而不是屏障同步点/检查点。 每个节点上的并发程序分为一个或多个区域。 当它们的区域兼容时,两个节点彼此通信。 如果它们的区域不兼容,则会发生通信不匹配。

    Media for performing parallel processing of distributed arrays
    9.
    发明授权
    Media for performing parallel processing of distributed arrays 有权
    用于执行分布式阵列并行处理的媒体

    公开(公告)号:US08255890B2

    公开(公告)日:2012-08-28

    申请号:US12254605

    申请日:2008-10-20

    IPC分类号: G06F9/45 G06F9/44

    CPC分类号: G06F9/5027 G06F8/314

    摘要: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.

    摘要翻译: 一个或多个计算机可读介质存储当由处理逻辑执行时执行并行处理的可执行指令。 媒体存储用于启动单个编程语言的一个或多个指令,以及通过单个编程语言识别用于执行程序的一个或多个数据分发方案。 媒体还存储一个或多个指令,用于通过单一编程语言将程序转换成具有从一个或多个识别的数据分发方案中选择的最佳数据分配方案的并行程序,并将并行程序分配给两个或更多个实验室 用于并行执行。 媒体还存储一个或多个指令,用于从两个或更多实验室接收与并行程序的并行执行相关联的一个或多个结果,并将一个或多个结果提供给程序。