摘要:
The present invention provides a system and method for detecting communication error among multiple nodes in a concurrent computing environment. A barrier synchronization point or regions are used to check for communication mismatch. The barrier synchronization can be placed anywhere in a concurrent computing program. If a communication error occurred before the barrier synchronization point, it would at least be detected when a node enters the barrier synchronization point. Once a node has reached the barrier synchronization point, it is not allowed to communicate with another node regarding data that is needed to execute the concurrent computing program, even if the other node has not reached the barrier synchronization point. Regions can also be used to detect a communication mismatch instead of barrier synchronization points. A concurrent program on each node is separated into one or more regions. Two nodes can only communicate with each other when their regions are compatible. If their regions are not compatible, then there is a communication mismatch.
摘要:
A device, for performing parallel processing, includes a processor to receive one or more portions of an inner context of a program created for a technical computing environment, and allocate one or more portions of the inner context of the program to two or more labs for parallel execution. The processor is also configured to receive one or more results associated with the parallel execution of the one or more portions from the two or more labs, and provide the one or more results to an outer context of the program.
摘要:
One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for receiving one or more portions of an inner context of a program created for a technical computing environment, allocating one or more portions of the inner context of the program to two or more labs for parallel execution, receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to an outer context of the program.
摘要:
One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for receiving one or more portions of an inner context of a program created for a technical computing environment, allocating one or more portions of the inner context of the program to two or more labs for parallel execution, receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to an outer context of the program.
摘要:
A system and method detects communication error among multiple nodes in a concurrent computing environment. One or more barrier synchronization points/checkpoints or regions are used to check for a communication mismatch. The barrier synchronization point(s)/checkpoint(s) can be placed anywhere in the concurrent computing program. Once a node reaches a barrier synchronization point/checkpoint, it is not allowed to communicate with another node regarding data that is needed to execute the concurrent computing program, even if the other node has not reached the barrier synchronization point/checkpoint. Regions can also, or alternatively, be used to detect a communication mismatch instead of barrier synchronization points/checkpoints. A concurrent program on each node is separated into one or more regions. Two nodes communicate with each other when their regions are compatible. If their regions are not compatible, a communication mismatch occurs.
摘要:
A computing device-implemented method includes receiving a program, analyzing and transforming the program, determining an inner context and an outer context of the program based on the analysis of the program, and allocating one or more portions of the inner context of the program to two or more labs for parallel execution. The method also includes receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to the outer context of the program.
摘要:
A computing device-implemented method includes receiving a program, analyzing and transforming the program, determining an inner context and an outer context of the program based on the analysis of the program, and allocating one or more portions of the inner context of the program to two or more labs for parallel execution. The method also includes receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to the outer context of the program.
摘要:
A device initiates a technical computing environment (TCE), and receives, via the TCE, a program command that permits the TCE to access a graphical processing unit that is remote to the device, where the program command permits the TCE to seamlessly transfer data to the remote GPU. The device transforms, via the TCE, the program command into a program command that is executable by the remote GPU, and provides the transformed program command to the remote GPU for execution. The device also receives, from the remote GPU, one or more results associated with execution of the transformed program command by the remote GPU, and utilizes the one or more results via the TCE.
摘要:
One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.
摘要:
A device for performing parallel processing includes a processor to initiate a single programming language, and identify, via the single programming language, one or more data distribution schemes for executing a program. The processor also transforms, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocates the parallel program to two or more labs for parallel execution. The processor further receives one or more results associated with the parallel execution of the parallel program from the two or more labs, and provides the one or more results to the program.