Synchronizing device error information among nodes
    2.
    发明授权
    Synchronizing device error information among nodes 有权
    在节点之间同步设备错误信息

    公开(公告)号:US07904752B2

    公开(公告)日:2011-03-08

    申请号:US12132550

    申请日:2008-06-03

    IPC分类号: G06F11/00

    CPC分类号: H04L41/0654 H04L41/0677

    摘要: Provided are a method, system, and article of manufacture for synchronizing device error information among nodes. A first node performs an action with respect to a first node error counter for a device in communication with the first node and a second node. The first node transmits a message to the second node indicating the device and the action performed with respect to the first node error counter for the device. The second node performs the action indicated in the message with respect to a second node error counter for the device indicated in the message, wherein the second node error counter corresponds to the first node error counter for the device.

    摘要翻译: 提供了用于在节点之间同步设备错误信息的方法,系统和制品。 第一节点相对于与第一节点和第二节点通信的设备的第一节点错误计数器执行动作。 第一节点向第二节点发送指示设备的消息和针对设备的第一节点错误计数器执行的动作。 第二节点针对针对消息中指示的设备的第二节点错误计数器执行消息中指示的动作,其中第二节点错误计数器对应于设备的第一节点错误计数器。

    Method to adjust error thresholds in a data storage and retrieval system
    3.
    发明授权
    Method to adjust error thresholds in a data storage and retrieval system 失效
    调整数据存储和检索系统中的错误阈值的方法

    公开(公告)号:US07752488B2

    公开(公告)日:2010-07-06

    申请号:US11326652

    申请日:2006-01-06

    IPC分类号: G06F11/00

    CPC分类号: G06F11/008

    摘要: A method is disclosed to adjust error thresholds in a data storage and retrieval system. The method supplies a data storage and retrieval system comprising memory and microcode, wherein that microcode comprises one or more default error thresholds. The method determines if the memory comprises one or more operational error thresholds. If the method determines that the memory comprises one or more operational error thresholds, then the method operates the data storage and retrieval system using those one or more operational error thresholds. Alternatively, if the method determines that the memory does not comprise one or more operational error thresholds, then the method sets the one or more default error thresholds as the one or more operational error thresholds.

    摘要翻译: 公开了一种调整数据存储和检索系统中的误差阈值的方法。 该方法提供包括存储器和微代码的数据存储和检索系统,其中该微代码包括一个或多个默认错误阈值。 该方法确定存储器是否包括一个或多个操作错误阈值。 如果该方法确定存储器包括一个或多个操作错误阈值,则该方法使用那些一个或多个操作错误阈值来操作数据存储和检索系统。 或者,如果方法确定存储器不包括一个或多个操作错误阈值,则该方法将一个或多个默认错误阈值设置为一个或多个操作错误阈值。

    SYNCHRONIZING DEVICE ERROR INFORMATION AMONG NODES
    4.
    发明申请
    SYNCHRONIZING DEVICE ERROR INFORMATION AMONG NODES 有权
    同步设备错误信息在NODES

    公开(公告)号:US20090300436A1

    公开(公告)日:2009-12-03

    申请号:US12132550

    申请日:2008-06-03

    IPC分类号: G06F11/07

    CPC分类号: H04L41/0654 H04L41/0677

    摘要: Provided are a method, system, and article of manufacture for synchronizing device error information among nodes. A first node performs an action with respect to a first node error counter for a device in communication with the first node and a second node. The first node transmits a message to the second node indicating the device and the action performed with respect to the first node error counter for the device. The second node performs the action indicated in the message with respect to a second node error counter for the device indicated in the message, wherein the second node error counter corresponds to the first node error counter for the device.

    摘要翻译: 提供了用于在节点之间同步设备错误信息的方法,系统和制品。 第一节点相对于与第一节点和第二节点通信的设备的第一节点错误计数器执行动作。 第一节点向第二节点发送指示设备的消息和针对设备的第一节点错误计数器执行的动作。 第二节点针对针对消息中指示的设备的第二节点错误计数器执行消息中指示的动作,其中第二节点错误计数器对应于设备的第一节点错误计数器。

    Apparatus, system, and method for overriding resource controller lock ownership
    5.
    发明授权
    Apparatus, system, and method for overriding resource controller lock ownership 失效
    用于覆盖资源控制器锁拥有权的装置,系统和方法

    公开(公告)号:US07487277B2

    公开(公告)日:2009-02-03

    申请号:US11247465

    申请日:2005-10-11

    IPC分类号: G06F12/14 G06F12/00 G06F11/00

    摘要: An apparatus, system, and method are disclosed for autonomously overriding a global resource lock. The apparatus includes a determination module, an override module, and an assertion module. The determination module determines whether a global resource lock is owned by a peer resource controller and that the peer resource controller is offline in response to the peer resource controller owning the global resource lock. The atomic module atomically overrides ownership of the global resource lock from the peer resource controller. The assertion module asserts active ownership of the global resource lock. The apparatus, system, and method provide an autonomous override of the global resource lock, minimizing system downtime and user intervention.

    摘要翻译: 公开了一种自动覆盖全局资源锁的装置,系统和方法。 该装置包括确定模块,超控模块和断言模块。 确定模块确定全局资源锁是否由对等资源控制器拥有,并且对等资源控制器响应于拥有全局资源锁的对等资源控制器而脱机。 原子模块原子地覆盖来自对等资源控制器的全局资源锁的所有权。 断言模块断言全局资源锁的活动所有权。 设备,系统和方法提供全局资源锁的自动覆盖,最大限度地减少系统停机时间和用户干预。

    Apparatus, system, and method for facilitating monitoring and responding to error events
    7.
    发明授权
    Apparatus, system, and method for facilitating monitoring and responding to error events 失效
    用于便于监测和应对错误事件的装置,系统和方法

    公开(公告)号:US07523359B2

    公开(公告)日:2009-04-21

    申请号:US11095062

    申请日:2005-03-31

    IPC分类号: G06F11/00

    CPC分类号: G06F11/076

    摘要: An apparatus, system, and method are disclosed for facilitating monitoring and responding to error events. An apparatus may includes a set of counters associated with a processing system resource, each counter associated with an error event and having attributes defining a count value, counter thresholds directly related to time, and empirical status information for the error event related to time. A user may adjust counter thresholds indirectly to set an error tolerance. An update module may update counters within the set based on an error event for the processing system resource. The management module persists and maintains a life cycle for counters based on counter attributes. Each counter may be of two types either a fixed counter that counts error events from a start time for a defined duration or a sliding counter that counts error events up to a predefined number of error events within a window of time.

    摘要翻译: 公开了一种便于监视和响应错误事件的装置,系统和方法。 装置可以包括与处理系统资源相关联的一组计数器,每个计数器与错误事件相关联,并具有定义计数值的属性,与时间直接相关的计数器阈值,以及与时间有关的错误事件的经验状态信息。 用户可以间接调整计数器阈值以设置误差容差。 更新模块可以基于处理系统资源的错误事件更新该集合内的计数器。 管理模块根据计数器属性持续存储计数器的生命周期。 每个计数器可以是两种类型的固定计数器,该计数器从定义的持续时间的开始时间计数错误事件,或者在时间窗口内将错误事件计数到预定数量的错误事件的滑动计数器。

    Apparatus, system, and method for identifying a faulty communication module
    8.
    发明授权
    Apparatus, system, and method for identifying a faulty communication module 失效
    用于识别故障通信模块的装置,系统和方法

    公开(公告)号:US07251753B2

    公开(公告)日:2007-07-31

    申请号:US10666660

    申请日:2003-09-17

    IPC分类号: G06F11/00

    摘要: An apparatus, method, and system associates an identifier with a data packet. The identifier uniquely identifies a communication module, such as a host interface card, within a data storage system. In operation, a computer host sends a data packet to a server. The communication module receives the data packet and associates an identifier, unique to the communication module, with the data packet. The data packet is stored in a disk array, such as a Redundant Array of Independent Disks (RAID) system. When the computer host later requests the stored data packet, a validation module, which may be implemented within a PCI adapter such as a host interface card, retrieves the data packet and determines whether the data packet is corrupt. If the data packet is corrupt, the validation module identifies which host interface card corrupted the data with the use of the unique identifier associated with the data packet. The faulty communication module may then be removed from operation in the data storage system.

    摘要翻译: 设备,方法和系统将标识符与数据分组相关联。 标识符唯一地标识数据存储系统内的通信模块,例如主机接口卡。 在操作中,计算机主机向服务器发送数据包。 通信模块接收数据分组并将通信模块唯一的标识符与数据分组相关联。 数据包存储在磁盘阵列中,例如独立磁盘冗余阵列(RAID)系统。 当计算机主机稍后请求存储的数据分组时,可以在PCI适配器(例如主机接口卡)内实现的验证模块检索数据分组并确定数据分组是否损坏。 如果数据包损坏,则验证模块使用与数据包相关联的唯一标识符来识别哪个主机接口卡损坏了数据。 然后,故障通信模块可以从数据存储系统中的操作中移除。

    Apparatus, system, and method for data tracking
    9.
    发明授权
    Apparatus, system, and method for data tracking 有权
    用于数据跟踪的装置,系统和方法

    公开(公告)号:US07826380B2

    公开(公告)日:2010-11-02

    申请号:US11093393

    申请日:2005-03-30

    IPC分类号: G01R31/08

    摘要: An apparatus, system, and method are disclosed for data tracking and, in particular, for facilitating failure management within an electronic data communication system. The apparatus includes a tracking module and an error analysis module. The tracking module stores an adapter identifier in a tracking array. The adapter identifier corresponds to a source adapter from which data is received. The error analysis module determines a source of a data failure in response to recognition of the data failure. The data failure may occur on a host adapter, a device adapter, a communication fabric, a multi-processor, or another communication device. The apparatus, system, and method may be implemented in place of or in addition to hardware-assisted data integrity checking within a data storage system.

    摘要翻译: 公开了一种用于数据跟踪的装置,系统和方法,特别是用于促进电子数据通信系统内的故障管理。 该装置包括跟踪模块和误差分析模块。 跟踪模块将适配器标识符存储在跟踪数组中。 适配器标识符对应于从其接收数据的源适配器。 错误分析模块响应于数据故障的识别确定数据故障的来源。 数据故障可能发生在主机适配器,设备适配器,通信结构,多处理器或其他通信设备上。 该装置,系统和方法可以代替数据存储系统中的硬件辅助数据完整性检查来替代或补充。

    DETERMINING MODIFIED DATA IN CACHE FOR USE DURING A RECOVERY OPERATION
    10.
    发明申请
    DETERMINING MODIFIED DATA IN CACHE FOR USE DURING A RECOVERY OPERATION 审中-公开
    确定在恢复操作期间使用的缓存中的修改数据

    公开(公告)号:US20100174676A1

    公开(公告)日:2010-07-08

    申请号:US12349460

    申请日:2009-01-06

    IPC分类号: G06F12/16 G06F17/30

    摘要: Provided are a method, system, and article of manufacture for determining modified data in cache for use during a recovery operation. An event is detected during which processing of writes to a storage device is suspended. A cache including modified data not destaged to the storage device is scanned to determine the data units having modified data in response to detecting the event. The data units having the modified data is indicated in a backup storage. The indication of the data units having the modified data in the backup storage is used during a recovery operation.

    摘要翻译: 提供了用于确定在恢复操作期间使用的高速缓存中的修改数据的方法,系统和制品。 检测到在暂停写入存储设备的处理期间的事件。 扫描包括未转移到存储设备的修改数据的高速缓存,以响应于检测到事件来确定具有修改数据的数据单元。 具有修改数据的数据单元在备用存储器中指示。 在恢复操作期间使用具有备份存储器中的修改数据的数据单元的指示。