-
公开(公告)号:US10223189B1
公开(公告)日:2019-03-05
申请号:US14751036
申请日:2015-06-25
Applicant: Amazon Technologies, Inc.
Inventor: Ganesh Viswanathan , Vinayak Sasikumar , Artur Pop , Shuai Chang , Benjamin Ryan Zeghers
Abstract: Suppression routines are described for implementation by a monitoring service. The monitoring service uses collected data to identify faulty storage volumes. Advantageously, in some cases, the monitoring service can notify an operator of the storage system that certain storage volumes are faulty. In some embodiments, these notifications are to be suppressed because not all notifications of faulty volumes are necessary. Suppression rules can indicate that a faulty storage volume is at fault because it is a test volume, associated with a large power outage, or some other learned event from storage command metrics. The monitoring service can suppress notifications about these known system issues, among others.
-
公开(公告)号:US09898357B1
公开(公告)日:2018-02-20
申请号:US14751047
申请日:2015-06-25
Applicant: Amazon Technologies, Inc.
Inventor: Ganesh Viswanathan , Vinayak Sasikumar , Artur Pop , Shuai Chang , Benjamin Ryan Zeghers
CPC classification number: G06F11/079 , G06F11/0727 , G06F11/3452 , G06F2201/815
Abstract: Notification routines are described for implementation by a monitoring service. As part of an exemplary notification routine, a faulty storage volume is correlated at multiple logical storage levels of a storage system with other faulty storage volumes. The correlation pattern can follow a tree-based decision format, where each faulty storage volume is sequentially compared at a lower logical storage level. Advantageously, once a common logical storage component of a group of storage volumes is identified, a notification is issued about the group of faulty storage volumes sharing the common logical storage component. Additionally, notifications can be issued according to a severity level of the group of faulty storage volumes. In some embodiments, before issuing the notification, the group of faulty storage volumes can be compared to a time allowed for the group of faulty storage volume to be at fault.
-
公开(公告)号:US10282245B1
公开(公告)日:2019-05-07
申请号:US14751028
申请日:2015-06-25
Applicant: Amazon Technologies, Inc.
Inventor: Ganesh Viswanathan , Vinayak Sasikumar , Artur Pop , Shuai Chang , Benjamin Ryan Zeghers
Abstract: A storage system includes a monitoring service that identifies root causes of storage systems issues using relationships. The monitoring service can use thresholds associated with the relationships to detect the root causes. Relationships can be based on correlation relationships between the different levels of the storage system. In various embodiments, relationships can also be based on events that affect multiple storage volumes or on short-term events. Once a relationship is identified, a threshold for that relationship is generated or updated. The monitoring service can make that threshold accessible to other components of the monitoring service or an operator of the storage system to be used in detecting root causes.
-
-