-
公开(公告)号:US11894972B2
公开(公告)日:2024-02-06
申请号:US17811519
申请日:2022-07-08
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
IPC: G06F11/00 , H04L41/0668 , G06F11/20 , G06F11/14 , G06F11/16 , H04L67/51 , G06F3/06 , H04L67/1097
CPC classification number: H04L41/0668 , G06F3/0617 , G06F3/0653 , G06F3/0659 , G06F3/0683 , G06F11/1425 , G06F11/1662 , G06F11/2028 , G06F11/2041 , G06F11/2094 , G06F11/2097 , H04L67/1097 , H04L67/51 , G06F11/2048 , G06F2201/825
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
-
公开(公告)号:US11475038B2
公开(公告)日:2022-10-18
申请号:US15893496
申请日:2018-02-09
Applicant: Amazon Technologies, Inc.
Inventor: Deepak Agarwal , Anurag Windlass Gupta , Jakub Kulesza
IPC: G06F16/00 , G06F16/27 , G06F16/245 , G06F16/23 , G06F11/14
Abstract: A distributed data warehouse system maintains data blocks on behalf of clients, and stores primary and secondary copies of data blocks on different disks or nodes in a cluster. The data warehouse system may back up data blocks in a key-value backup storage system. In response to a query targeting a data block previously stored in the cluster, the data warehouse system may determine whether a consistent, uncorrupted copy of the data block is available in the cluster (e.g., by applying a consistency check). If not (e.g., if a disk or node failed), the data warehouse system may automatically initiate an operation to restore the data block from the backup storage system, using a unique identifier of the data block to access a backup copy. The target data may be returned in a query response prior to restoring primary and secondary copies of the data block in the cluster.
-
公开(公告)号:US11422982B2
公开(公告)日:2022-08-23
申请号:US16283510
申请日:2019-02-22
Applicant: Amazon Technologies, Inc.
Inventor: Jakub Kulesza , Srividhya Srinivasan , Deepak Agarwal , Anurag Windlass Gupta
IPC: G06F16/21 , G06F16/182 , G06F9/50
Abstract: A stateful cluster may implement scaling of the stateful cluster while maintaining access to the state of the stateful cluster. A scaling event for a stateful cluster may be detected, and in response the stateful cluster may be adjusted to include a different number of nodes. The state of the cluster may then be logically distributed among the different number of nodes according to a monotone distribution scheme. The adjusted node may then service access requests according to the monotone distribution scheme. Prior to making the adjusted storage cluster available for servicing access requests, the nodes from the original cluster may still service access requests for state.
-
公开(公告)号:US11388043B2
公开(公告)日:2022-07-12
申请号:US16833334
申请日:2020-03-27
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
IPC: G06F11/00 , H04L41/0668 , G06F11/20 , G06F11/14 , G06F11/16 , G06F3/06 , H04L67/1097 , H04L67/51
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
-
公开(公告)号:US10061834B1
公开(公告)日:2018-08-28
申请号:US14530495
申请日:2014-10-31
Applicant: Amazon Technologies, Inc.
Inventor: Jakub Kulesza , Bharath Kumar Chelepalli , Deepak Agarwal , Anurag Windlass Gupta
IPC: G06F17/30
CPC classification number: G06F16/283 , G06F16/27
Abstract: A data store may implement incremental out-of-place updates to a dataset. A dataset may maintain data across different storage locations linked together according to an ordering schema for servicing queries. As updates to the dataset are received, the updates may be persisted but not maintained in-place. In order to update the data store and maintain the ordering schema, incremental updates to the dataset may be performed without blocking queries directed toward the dataset. The dataset may be divided into multiple data chunks that correspond to different storage locations and an updated version of the data chunk may be generated in new storage locations. The new storage locations may then replace the storage locations of the prior version of the data chunk in order to link the new storage locations to the other linked storage locations in the dataset for servicing queries.
-
公开(公告)号:US09886348B2
公开(公告)日:2018-02-06
申请号:US14754564
申请日:2015-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
CPC classification number: G06F11/1451 , G06F11/1425 , G06F11/2094 , G06F11/2097 , G06F17/30557 , G06F17/30575 , G06F17/30578
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of partitions that are stored on respective computing nodes in the system. A master replica for a replica group may increment a membership version indicator for the group, and may propagate metadata (including the membership version indicator) indicating a membership change for the group to other members of the group. Propagating the metadata may include sending a log record containing the metadata to the other replicas to be appended to their respective logs. Once the membership change becomes durable, it may be committed. A replica attempting to become the master of a replica group may determine that another replica in the group has observed a more recent membership version, in which case logs may be synchronized or snipped, or the attempt may be abandoned.
-
公开(公告)号:US11507480B2
公开(公告)日:2022-11-22
申请号:US16185423
申请日:2018-11-09
Applicant: Amazon Technologies, Inc.
Inventor: Michael T. Helmick , Jakub Kulesza , Timothy Andrew Rath , Stefano Stefani , David Alan Lutz
Abstract: Disclosed are various embodiments for distributing data items within a plurality of nodes. A data item that is subject to a data item update request is updated from a master node to a plurality of slave notes. The update of the data item is determined to be locality-based durable based at least in part on acknowledgements received from the slave nodes. Upon detection that the master node has failed, a new master candidate is determined via an election among the plurality of slave nodes.
-
公开(公告)号:US11442824B2
公开(公告)日:2022-09-13
申请号:US15650054
申请日:2017-07-14
Applicant: Amazon Technologies, Inc.
Inventor: Michael T. Helmick , Jakub Kulesza , Stefano Stefani , David A. Lutz
IPC: G06F16/27 , G06F16/182 , G06F11/20 , G06F11/14
Abstract: Disclosed are various embodiments for distributing data items. A plurality of nodes forms a distributed data store. A new master candidate is determined through an election among the plurality of nodes. Before performing a failover from a failed master to the new master candidate, a consensus is reached among a locality-based failover quorum of the nodes. The quorum excludes any of the nodes that are in a failover quorum ineligibility mode.
-
公开(公告)号:US11068501B2
公开(公告)日:2021-07-20
申请号:US15464272
申请日:2017-03-20
Applicant: Amazon Technologies, Inc.
Inventor: Anurag Windlass Gupta , Jakub Kulesza , Don Johnson , Deepak Agarwal , Tushar Jain
Abstract: A distributed database system may perform a single phase commit for transactions involving updates to multiple databases of the distributed database system. A client request may be received that involves updates to multiple database of the distributed database system. The updates may be performed at a front-end database and a back-end database. Log records indicating the updates to the front-end database may be sent to the back-end database. The log records and the updates performed at the back-end database may be committed together as a single phase commit at the back-end database. In the event of a system failure of the front-end database, log records may be requested and received from the back-end database. A restoration of the front-end database may be performed based, at least in part, on the received log records.
-
公开(公告)号:US20200341657A1
公开(公告)日:2020-10-29
申请号:US16926519
申请日:2020-07-10
Applicant: Amazon Technologies, Inc.
Inventor: Stefano Stefani , Timothy Andrew Rath , Chiranjeeb Buragahain , Yan Valerie Leshinsky , David Alan Lutz , Jakub Kulesza , Wei Xiao , Jai Vasanth
Abstract: A system that implements a scalable data storage service may maintain tables in a data store on behalf of storage service clients. The service may maintain table data in multiple replicas of partitions that are stored on respective computing nodes in the system. In response to detecting an anomaly in the system, detecting a change in data volume on a partition or service request traffic directed to a partition, or receiving a service request from a client to split a partition, the data storage service may create additional copies of a partition replica using a physical copy mechanism. The data storage service may issue a split command defined in an API for the data store to divide the original and additional replicas into multiple replica groups, and to configure each replica group to maintain a respective portion of the table data that was stored in the partition before the split.
-
-
-
-
-
-
-
-
-