Root cause detection and monitoring for storage systems

    公开(公告)号:US09898357B1

    公开(公告)日:2018-02-20

    申请号:US14751047

    申请日:2015-06-25

    CPC classification number: G06F11/079 G06F11/0727 G06F11/3452 G06F2201/815

    Abstract: Notification routines are described for implementation by a monitoring service. As part of an exemplary notification routine, a faulty storage volume is correlated at multiple logical storage levels of a storage system with other faulty storage volumes. The correlation pattern can follow a tree-based decision format, where each faulty storage volume is sequentially compared at a lower logical storage level. Advantageously, once a common logical storage component of a group of storage volumes is identified, a notification is issued about the group of faulty storage volumes sharing the common logical storage component. Additionally, notifications can be issued according to a severity level of the group of faulty storage volumes. In some embodiments, before issuing the notification, the group of faulty storage volumes can be compared to a time allowed for the group of faulty storage volume to be at fault.

    Root cause detection and monitoring for storage systems

    公开(公告)号:US10282245B1

    公开(公告)日:2019-05-07

    申请号:US14751028

    申请日:2015-06-25

    Abstract: A storage system includes a monitoring service that identifies root causes of storage systems issues using relationships. The monitoring service can use thresholds associated with the relationships to detect the root causes. Relationships can be based on correlation relationships between the different levels of the storage system. In various embodiments, relationships can also be based on events that affect multiple storage volumes or on short-term events. Once a relationship is identified, a threshold for that relationship is generated or updated. The monitoring service can make that threshold accessible to other components of the monitoring service or an operator of the storage system to be used in detecting root causes.

    Root cause detection and monitoring for storage systems

    公开(公告)号:US10223189B1

    公开(公告)日:2019-03-05

    申请号:US14751036

    申请日:2015-06-25

    Abstract: Suppression routines are described for implementation by a monitoring service. The monitoring service uses collected data to identify faulty storage volumes. Advantageously, in some cases, the monitoring service can notify an operator of the storage system that certain storage volumes are faulty. In some embodiments, these notifications are to be suppressed because not all notifications of faulty volumes are necessary. Suppression rules can indicate that a faulty storage volume is at fault because it is a test volume, associated with a large power outage, or some other learned event from storage command metrics. The monitoring service can suppress notifications about these known system issues, among others.

Patent Agency Ranking