RE-FORMING AN APPLICATION CONTROL TREE WITHOUT TERMINATING THE APPLICATION
    1.
    发明申请
    RE-FORMING AN APPLICATION CONTROL TREE WITHOUT TERMINATING THE APPLICATION 有权
    在不终止应用程序的情况下重新生成应用控制树

    公开(公告)号:US20140281663A1

    公开(公告)日:2014-09-18

    申请号:US13797342

    申请日:2013-03-12

    申请人: CRAY INC.

    发明人: Marlys Kohnke

    IPC分类号: G06F11/14

    摘要: A reconnection system re-forms a control tree for an application that is executed in parallel without terminating execution of the application. The reconnection system detects when a node of a control tree has failed and directs the nodes that have not failed to reconnect to effect the re-forming of the control tree without the failed node and without terminating the application. Upon being directed to reconnect, a node identifies new child nodes that are to be its child nodes in the re-formed control tree. The node maintains the existing connection with each of its current child nodes that is also a new child node, terminates the existing connection with each of its current child nodes that is not also a new child node, establishes a new connection with any new child node that is not a current child node, and directs each new child node to reconnect.

    摘要翻译: 重新连接系统重新形成并行执行的应用程序的控制树,而不终止应用程序的执行。 重新连接系统检测控制树的节点何时发生故障,并且指示没有失败的节点重新连接以影响没有故障节点而不终止应用的控制树的重新形成。 在被指示重新连接时,节点识别将在重新形成的控制树中作为其子节点的新子节点。 该节点维护与其当前子节点(也是新的子节点)的现有连接,终止与其当前子节点(也不是新的子节点)的现有连接,与任何新的子节点建立新连接 这不是当前的子节点,并且引导每个新的子节点重新连接。

    FINAL FAULTY CORE RECOVERY MECHANISMS FOR A TWO-DIMENSIONAL NETWORK ON A PROCESSOR ARRAY
    2.
    发明申请
    FINAL FAULTY CORE RECOVERY MECHANISMS FOR A TWO-DIMENSIONAL NETWORK ON A PROCESSOR ARRAY 有权
    用于处理器阵列的二维网络的最终故障核心恢复机制

    公开(公告)号:US20140095923A1

    公开(公告)日:2014-04-03

    申请号:US13631496

    申请日:2012-09-28

    IPC分类号: G06F11/20

    摘要: Embodiments of the invention relate to faulty recovery mechanisms for a two-dimensional (2-D) network on a processor array. One embodiment comprises a processor array including multiple processors core circuits, and a redundant routing system for routing packets between the core circuits. The redundant routing system comprises multiple switches, wherein each switch corresponds to one or more core circuits of the processor array. The redundant routing system further comprises multiple data paths interconnecting the switches, and a controller for selecting one or more data paths. Each selected data path is used to bypass at least one component failure of the processor array to facilitate full operation of the processor array.

    摘要翻译: 本发明的实施例涉及处理器阵列上的二维(2-D)网络的故障恢复机制。 一个实施例包括包括多个处理器核心电路的处理器阵列和用于在核心电路之间路由分组的冗余路由系统。 冗余路由系统包括多个交换机,其中每个交换机对应于处理器阵列的一个或多个核心电路。 冗余路由系统还包括互连交换机的多个数据路径,以及用于选择一个或多个数据路径的控制器。 每个选择的数据路径用于绕过处理器阵列的至少一个组件故障,以便于处理器阵列的完全操作。

    Reconfigurable computing machine and related systems and methods
    3.
    发明授权
    Reconfigurable computing machine and related systems and methods 有权
    可重构计算机及相关系统和方法

    公开(公告)号:US07809982B2

    公开(公告)日:2010-10-05

    申请号:US11243508

    申请日:2005-10-03

    IPC分类号: G06F11/00

    摘要: A computing machine comprises an electronic circuit operable to perform a function, a programmable integrated circuit such as an FPGA, and a processor. The processor is operable to detect a failure of the electronic circuit and to configure the programmable integrated circuit to perform the function of the electronic circuit in response to detecting the failure. Alternatively, the computing machine comprises a hardwired pipeline operable to perform a function and a processor operable to detect a failure of the pipeline and to perform the function in response to detecting the failure. By allowing a first type of circuit (e.g., an FPGA) to take over for a failed second type of circuit (e.g., a processor), such a computing machine can be fault-tolerant without having redundant versions of each component, and may thus be less expensive and smaller than computing machines of comparable computing power.

    摘要翻译: 计算机包括可执行功能的电子电路,诸如FPGA的可编程集成电路和处理器。 处理器可操作以检测电子电路的故障并且配置可编程集成电路以响应于检测到故障来执行电子电路的功能。 或者,计算机包括可操作以执行功能的硬连线管线和可操作以检测管道故障并且响应于检测到故障而执行功能的处理器。 通过允许第一类电路(例如,FPGA)接管故障的第二类电路(例如,处理器),这样的计算机可以是容错的,而不需要每个组件的冗余版本,并且因此可以 比较可靠的计算能力的计算机更便宜,更小。

    Application cluster in security gateway for high availability and load sharing
    4.
    发明授权
    Application cluster in security gateway for high availability and load sharing 有权
    安全网关中的应用集群,实现高可用性和负载分担

    公开(公告)号:US07797566B2

    公开(公告)日:2010-09-14

    申请号:US11456575

    申请日:2006-07-11

    申请人: Amit Dror Omer Schory

    发明人: Amit Dror Omer Schory

    IPC分类号: G06F11/00

    摘要: A method for load sharing and high availability in a cluster of computers. The cluster includes a first computer and a second computer which perform a task An active application runs in the first computer and a standby application is installed in the second computer. The active application and the standby application are included in an application group. A first plurality of applications is installed in the first computer; the first plurality includes the running active application. The active application performs the task and stores in memory of the first computer state parameters and a policy. A synchronized copy of the state parameters and the policy pertaining to the task is maintained by storing in memory of the second computer. Preferably, the cluster is in a security gateway between data networks and performs a task related to security of one or more of the networks.

    摘要翻译: 一种计算机集群中负载分担和高可用性的方法。 集群包括执行任务的第一计算机和第二计算机在第一计算机中运行活动应用程序,并且备用应用程序安装在第二计算机中。 活动应用程序和备用应用程序包含在应用程序组中。 在第一计算机中安装第一多个应用程序; 第一个包括运行的活动应用程序。 活动应用程序执行该任务并存储在第一计算机状态参数和策略的存储器中。 通过存储第二计算机的存储器来维护状态参数和与任务有关的策略的同步副本。 优选地,群集在数据网络之间的安全网关中,并且执行与一个或多个网络的安全性有关的任务。

    METHOD FOR IMPLEMENTING DYNAMIC LIFETIME RELIABILITY EXTENSION FOR MICROPROCESSOR ARCHITECTURES
    6.
    发明申请
    METHOD FOR IMPLEMENTING DYNAMIC LIFETIME RELIABILITY EXTENSION FOR MICROPROCESSOR ARCHITECTURES 审中-公开
    实现微处理器结构动态寿命可靠性扩展的方法

    公开(公告)号:US20090178051A1

    公开(公告)日:2009-07-09

    申请号:US12118050

    申请日:2008-05-09

    IPC分类号: G06F9/50

    摘要: A method for implementing dynamic lifetime reliability extension for microprocessor architectures having a plurality of primary resources and a secondary resource pool of one or more secondary resources includes configuring a resource operational mode controller to selectively switch of the primary and secondary resources between an operational mode and a non-operational mode, wherein the non-operational mode corresponds to a lifetime extension process; configuring a resource mapper associated with the secondary resource pool and in communication with the resource operational mode controller to map a secondary resource placed into the operational mode to a corresponding primary resource placed into the non-operational mode; and configuring a transaction decoder to receive incoming transaction requests and direct the requests to one of a primary resource in the operational mode and a secondary resource in the operational mode, the secondary resource mapped to an associated primary resource placed in the non-operational mode.

    摘要翻译: 一种用于实现具有多个主要资源和一个或多个次要资源的辅助资源池的微处理器架构的动态生命周期可靠性扩展的方法包括配置资源操作模式控制器以在操作模式和操作模式之间选择性地切换主要和次要资源 非操作模式,其中所述非操作模式对应于终身延长过程; 配置与所述辅助资源池相关联并与所述资源操作模式控制器通信的资源映射器,以将放置在所述操作模式中的辅助资源映射到放置在所述非操作模式中的相应主资源; 以及配置事务解码器以接收传入的事务请求并将所述请求引导到所述操作模式中的主资源之一,以及将所述辅助资源映射到放置在所述非操作模式中的相关联的主资源。

    Network storage appliance with integrated redundant servers and storage controllers
    7.
    发明授权
    Network storage appliance with integrated redundant servers and storage controllers 有权
    集成冗余服务器和存储控制器的网络存储设备

    公开(公告)号:US07437604B2

    公开(公告)日:2008-10-14

    申请号:US11673573

    申请日:2007-02-10

    IPC分类号: G06F11/00 G06F11/20

    摘要: A network storage appliance includes a chassis, enclosing a storage controller and first and second servers. The storage controller has first and second I/O ports for coupling to first and second I/O links. The storage controller controls a plurality of physical disk drives and presents the plurality of physical disk drives as one or more logical disk drives on the first and second I/O links. The servers each have an I/O port for coupling to a respective one of the first and second I/O links. Each of the servers transmits packets to the storage controller over the respective I/O link. The packets include block-level protocol disk commands each identifying one of the logical disk drives, such as SCSI block level protocol commands each identifying one of said logical disk drives as a SCSI logical unit. The I/O links may be FibreChannel, Ethernet, or Infiniband links, for example.

    摘要翻译: 网络存储设备包括机箱,封闭存储控制器以及第一和第二服务器。 存储控制器具有用于耦合到第一和第二I / O链路的第一和第二I / O端口。 存储控制器控制多个物理磁盘驱动器并将多个物理磁盘驱动器呈现为第一和第二I / O链路上的一个或多个逻辑磁盘驱动器。 这些服务器各自具有用于耦合到第一和第二I / O链路中的相应一个的I / O端口。 每个服务器通过相应的I / O链路向存储控制器发送数据包。 这些分组包括每个识别逻辑磁盘驱动器之一的块级协议盘命令,例如每个将所述逻辑磁盘驱动器之一识别为SCSI逻辑单元的SCSI块级协议命令。 例如,I / O链路可以是FibreChannel,以太网或Infiniband链路。

    CORRELATING HARDWARE DEVICES BETWEEN LOCAL OPERATING SYSTEM AND GLOBAL MANAGEMENT ENTITY
    8.
    发明申请
    CORRELATING HARDWARE DEVICES BETWEEN LOCAL OPERATING SYSTEM AND GLOBAL MANAGEMENT ENTITY 有权
    当地操作系统与全球管理实体之间的相关硬件设备

    公开(公告)号:US20080201603A1

    公开(公告)日:2008-08-21

    申请号:US11675261

    申请日:2007-02-15

    IPC分类号: G06F11/20

    摘要: A method and apparatus for correlating the identities of hardware devices, such as processors and memory controllers, between a local operating system and a global management entity is described. When the operating system detects a faulting device, the operating system generates a fault message and transmits the fault message to the global management entity. The global management entity determines the identity of the faulting device based on information contained in the fault message, selects an appropriate replacement device, changes a routing table to map to the replacement device to the identity of the faulting device, and transmits to the operating system a global identity of the replacement device. The operating system correlates the local identity of the replacement device with the global identity of the replacement device.

    摘要翻译: 描述了用于在本地操作系统和全局管理实体之间关联诸如处理器和存储器控制器之类的硬件设备的标识的方法和装置。 当操作系统检测到故障设备时,操作系统会生成故障消息,并将故障消息发送给全局管理实体。 全球管理实体根据故障消息中包含的信息确定故障设备的身份,选择适当的替换设备,将路由表更改为替换设备,以映射到故障设备的身份,并发送到操作系统 替换设备的全局身份。 操作系统将替换设备的本地身份与替换设备的全局身份相关联。

    System and method of data transmission and method of selecting communication path for dual-controller system
    9.
    发明申请
    System and method of data transmission and method of selecting communication path for dual-controller system 有权
    数据传输的系统和方法以及选择双控制系统通信路径的方法

    公开(公告)号:US20080198846A1

    公开(公告)日:2008-08-21

    申请号:US11708490

    申请日:2007-02-21

    IPC分类号: H04L12/28

    摘要: A data transmission system and method and a method of selecting a communication path for a dual-controller system are provided, which are applied in a first controller and a second controller of the dual-controller system. First of all, a corresponding transmission medium is selected according to a feature of a data request issued by a controller, then the data request is converted into a data format compatible with a medium interface corresponding to the selected transmission medium and is sent to a corresponding medium driving portion connected with the medium interface, and the data request is sent to another controller through the medium driving portion and a connected corresponding medium controller, so as to select a path of the highest transmission performance, and realize the data transmission between the two controllers.

    摘要翻译: 提供了一种应用于双控制器系统的第一控制器和第二控制器的双控制器系统的数据传输系统和方法以及选择通信路径的方法。 首先,根据由控制器发出的数据请求的特征来选择相应的传输介质,然后将数据请求转换为与所选传输介质相对应的介质接口兼容的数据格式,并将其发送到相应的 介质驱动部分与介质接口连接,数据请求通过介质驱动部分和连接的相应介质控制器发送到另一控制器,以便选择最高传输性能的路径,并实现两者之间的数据传输 控制器

    System and method for implementing dynamic lifetime reliability extension for microprocessor architectures
    10.
    发明授权
    System and method for implementing dynamic lifetime reliability extension for microprocessor architectures 有权
    用于实现微处理器架构的动态终生可靠性扩展的系统和方法

    公开(公告)号:US07386851B1

    公开(公告)日:2008-06-10

    申请号:US11969413

    申请日:2008-01-04

    摘要: A system for implementing dynamic lifetime reliability extension for microprocessor architectures having a plurality of primary resources and a secondary resource pool of one or more secondary resources includes a resource operational mode controller configured to selectively switch of the primary and secondary resources between an operational mode and a non-operational mode, wherein the non-operational mode corresponds to a lifetime extension process; a resource mapper associated with the secondary resource pool and in communication with the resource operational mode controller, configured to map a secondary resource placed into the operational mode to a corresponding primary resource placed into the non-operational mode; and a transaction decoder configured to receive incoming transaction requests and direct the requests to one of a primary resource in the operational mode and a secondary resource in the operational mode, the secondary resource mapped to an associated primary resource placed in the non-operational mode.

    摘要翻译: 一种用于为具有多个主要资源和一个或多个次要资源的辅助资源池的微处理器架构实现动态生命周期可靠性扩展的系统包括:资源操作模式控制器,被配置为在操作模式和操作模式之间选择性地切换主要和次要资源 非操作模式,其中所述非操作模式对应于终身延长过程; 与所述辅助资源池相关联并与所述资源操作模式控制器通信的资源映射器,被配置为将放置在所述操作模式中的次级资源映射到放置在所述非操作模式中的相应主资源; 以及交易解码器,被配置为接收进入的交易请求并将所述请求定向到所述操作模式中的主要资源之一,以及所述操作模式中的辅助资源,所述辅助资源被映射到放置在所述非操作模式中的相关联的主要资源。