Temperature threshold application signal trigger for real-time relocation of process
    2.
    发明授权
    Temperature threshold application signal trigger for real-time relocation of process 失效
    温度阈值应用信号触发用于实时重定位过程

    公开(公告)号:US08250383B2

    公开(公告)日:2012-08-21

    申请号:US12109579

    申请日:2008-04-25

    IPC分类号: G06F1/26 G06F11/00

    CPC分类号: G06F1/206

    摘要: A method of managing a process relocation operation in a computing system is provided and includes determining respective operating temperatures of first, second and additional nodes of the system, where the first node has an elevated operating temperature and the second node has a normal operating temperature, notifying first and second kernels respectively associated with the first and second nodes, of a swapping condition, initially managing the first and second kernels to swap an application between the first and the second nodes while the swapping condition is in effect, and secondarily managing the first and second kernels to perform a barrier operation to end the swapping condition.

    摘要翻译: 提供了一种管理计算系统中的处理重定位操作的方法,并且包括确定系统的第一,第二和附加节点的相应操作温度,其中第一节点具有升高的工作温度,并且第二节点具有正常工作温度, 通知交换条件分别与第一和第二节点相关联的第一和第二内核,最初管理第一和第二内核以在交换条件有效的同时在第一和第二节点之间交换应用,并且其次管理第一和第二内核 和第二内核执行屏障操作以结束交换条件。

    DISTRIBUTING PARALLEL ALGORITHMS OF A PARALLEL APPLICATION AMONG COMPUTE NODES OF AN OPERATIONAL GROUP IN A PARALLEL COMPUTER
    4.
    发明申请
    DISTRIBUTING PARALLEL ALGORITHMS OF A PARALLEL APPLICATION AMONG COMPUTE NODES OF AN OPERATIONAL GROUP IN A PARALLEL COMPUTER 有权
    在并行计算机中运行组的计算机并发应用并行应用的并行分配算法

    公开(公告)号:US20090204789A1

    公开(公告)日:2009-08-13

    申请号:US12029045

    申请日:2008-02-11

    IPC分类号: G06F9/06

    摘要: Methods, apparatus, and products for distributing parallel algorithms of a parallel application among compute nodes of an operational group in a parallel computer are disclosed that include establishing a hardware profile, the hardware profile describing thermal characteristics of each compute node in the operational group; establishing a hardware independent application profile, the application profile describing thermal characteristics of each parallel algorithm of the parallel application; and mapping, in dependence upon the hardware profile and application profile, each parallel algorithm of the parallel application to a compute node in the operational group.

    摘要翻译: 公开了用于在并行计算机中的操作组的计算节点之间分配并行应用的并行算法的方法,装置和产品,其包括建立硬件简档,描述操作组中每个计算节点的热特性的硬件简档; 建立硬件独立应用简档,描述并行应用的每个并行算法的热特性的应用简档; 并且根据硬件简档和应用简档将并行应用的每个并行算法映射到操作组中的计算节点。

    ADAPTIVE RECOVERY FOR PARALLEL REACTIVE POWER THROTTLING
    5.
    发明申请
    ADAPTIVE RECOVERY FOR PARALLEL REACTIVE POWER THROTTLING 有权
    并行反应电力自适应恢复

    公开(公告)号:US20130159575A1

    公开(公告)日:2013-06-20

    申请号:US13327100

    申请日:2011-12-15

    IPC分类号: G06F13/24

    摘要: Power throttling may be used to conserve power and reduce heat in a parallel computing environment. Compute nodes in the parallel computing environment may be organized into groups based on, for example, whether they execute tasks of the same job or receive power from the same converter. Once one of compute nodes in the group detects that a parameter (i.e., temperature, current, power consumption, etc.) has exceeded a first threshold, power throttling on all the nodes in the group may be activated. However, before deactivating power throttling, a plurality of parameters associated with the group of compute nodes may be monitored to ensure they are all below a second threshold. If so, the power throttling for all of the compute nodes is deactivated.

    摘要翻译: 功率节流可用于节约电力并减少并行计算环境中的热量。 并行计算环境中的计算节点可以基于例如它们是否执行相同作业的任务或从相同的转换器接收功率而被组织成组。 一旦组中的计算节点之一检测到参数(即,温度,电流,功耗等)已经超过第一阈值,则可以激活组中所有节点上的功率节流。 然而,在停用功率节流之前,可以监视与该组计算节点相关联的多个参数,以确保它们都在低于第二阈值。 如果是这样,则停用所有计算节点的功率节流。

    Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job
    6.
    发明授权
    Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job 有权
    将连接的节点动态地重新分配到一个计算节点块,以重新启动失败的作业

    公开(公告)号:US08140889B2

    公开(公告)日:2012-03-20

    申请号:US12861426

    申请日:2010-08-23

    IPC分类号: G06F11/00

    CPC分类号: G06F11/2035 G06F11/203

    摘要: Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.

    摘要翻译: 方法,系统和产品用于动态重新分配连接的节点到计算节点块以重新启动失败的作业,包括:识别作业在计算节点块上执行失败,因为在分配为 至少一个用于计算节点块的连接节点及其支持的I / O节点; 并且重新启动该作业,包括选择主动耦合以与活动I / O节点进行数据通信的备选连接节点; 并且将替代连接的节点分配为用于运行重新启动的作业的计算节点的块的连接节点。

    Adaptive recovery for parallel reactive power throttling
    7.
    发明授权
    Adaptive recovery for parallel reactive power throttling 有权
    并联无功功率调节的自适应恢复

    公开(公告)号:US08799694B2

    公开(公告)日:2014-08-05

    申请号:US13327100

    申请日:2011-12-15

    IPC分类号: G06F1/32

    摘要: Power throttling may be used to conserve power and reduce heat in a parallel computing environment. Compute nodes in the parallel computing environment may be organized into groups based on, for example, whether they execute tasks of the same job or receive power from the same converter. Once one of compute nodes in the group detects that a parameter (i.e., temperature, current, power consumption, etc.) has exceeded a first threshold, power throttling on all the nodes in the group may be activated. However, before deactivating power throttling, a plurality of parameters associated with the group of compute nodes may be monitored to ensure they are all below a second threshold. If so, the power throttling for all of the compute nodes is deactivated.

    摘要翻译: 功率节流可用于节约电力并减少并行计算环境中的热量。 并行计算环境中的计算节点可以基于例如它们是否执行相同作业的任务或从相同的转换器接收功率而被组织成组。 一旦组中的一个计算节点检测到参数(即,温度,电流,功耗等)已经超过第一阈值,则可以激活组中所有节点上的功率节流。 然而,在停用功率节流之前,可以监视与该组计算节点相关联的多个参数,以确保它们都在低于第二阈值。 如果是这样,则停用所有计算节点的功率节流。

    Temperature Threshold Application Signal Trigger for Real-Time Relocation of Process
    8.
    发明申请
    Temperature Threshold Application Signal Trigger for Real-Time Relocation of Process 失效
    温度阈值应用信号触发器用于实时重定位过程

    公开(公告)号:US20090271608A1

    公开(公告)日:2009-10-29

    申请号:US12109579

    申请日:2008-04-25

    IPC分类号: G06F1/24

    CPC分类号: G06F1/206

    摘要: A method of managing a process relocation operation in a computing system is provided and includes determining respective operating temperatures of first, second and additional nodes of the system, where the first node has an elevated operating temperature and the second node has a normal operating temperature, notifying first and second kernels respectively associated with the first and second nodes, of a swapping condition, initially managing the first and second kernels to swap an application between the first and the second nodes while the swapping condition is in effect, and secondarily managing the first and second kernels to perform a barrier operation to end the swapping condition.

    摘要翻译: 提供了一种管理计算系统中的处理重定位操作的方法,并且包括确定系统的第一,第二和附加节点的相应操作温度,其中第一节点具有升高的工作温度,并且第二节点具有正常工作温度, 通知交换条件分别与第一和第二节点相关联的第一和第二内核,最初管理第一和第二内核以在交换条件有效的同时在第一和第二节点之间交换应用,并且其次管理第一和第二内核 和第二内核执行屏障操作以结束交换条件。

    DYNAMICALLY REASSIGNING A CONNECTED NODE TO A BLOCK OF COMPUTE NODES FOR RE-LAUNCHING A FAILED JOB
    10.
    发明申请
    DYNAMICALLY REASSIGNING A CONNECTED NODE TO A BLOCK OF COMPUTE NODES FOR RE-LAUNCHING A FAILED JOB 有权
    将连接的节点动态地重新连接到重新启动失败作业的电脑节目块

    公开(公告)号:US20120047393A1

    公开(公告)日:2012-02-23

    申请号:US12861426

    申请日:2010-08-23

    IPC分类号: G06F11/20

    CPC分类号: G06F11/2035 G06F11/203

    摘要: Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.

    摘要翻译: 方法,系统和产品用于动态重新分配连接的节点到计算节点块以重新启动失败的作业,包括:识别作业在计算节点块上执行失败,因为在分配为 至少一个用于计算节点块的连接节点及其支持的I / O节点; 并且重新启动该作业,包括选择主动耦合以与活动I / O节点进行数据通信的备选连接节点; 并且将替代连接的节点分配为用于运行重新启动的作业的计算节点的块的连接节点。