Distributing data on distributed storage systems

    公开(公告)号:US11620187B2

    公开(公告)日:2023-04-04

    申请号:US17445401

    申请日:2021-08-18

    Applicant: Google LLC

    Abstract: A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or more maintenance units. Each maintenance unit has an active state and an inactive state. Moreover, each storage device is associated with a maintenance unit. The determining of the distribution of the chunks includes identifying a random selection of the storage devices matching a number of chunks of the file and being capable of maintaining accessibility of the file when one or more maintenance units are in an inactive state. The method also includes distributing the chunks to storage devices of the distributed storage system according to the determined distribution.

    Quota-based resource scheduling
    2.
    发明授权

    公开(公告)号:US10931592B1

    公开(公告)日:2021-02-23

    申请号:US16377607

    申请日:2019-04-08

    Applicant: Google LLC

    Abstract: The present disclosure relates to dynamically scheduling resource requests in a distributed system based on usage quotas. One example method includes identifying usage information for a distributed system including atoms, each atom representing a distinct item used by users of the distributed system; determining that a usage quota associated with the distributed system has been exceeded based on the usage information, the usage quota representing an upper limit for a particular type of usage of the distributed system; receiving a first request for a particular atom requiring invocation of the particular type of usage represented by the usage quota; determining that a second request for a different type of usage of the particular atom is waiting to be processed; and processing the second request for the particular atom before processing the first request.

    Quota-based resource scheduling
    3.
    发明授权

    公开(公告)号:US10257111B1

    公开(公告)日:2019-04-09

    申请号:US15689640

    申请日:2017-08-29

    Applicant: Google LLC

    Abstract: The present disclosure relates to dynamically scheduling resource requests in a distributed system based on usage quotas. One example method includes identifying usage information for a distributed system including atoms, each atom representing a distinct item used by users of the distributed system; determining that a usage quota associated with the distributed system has been exceeded based on the usage information, the usage quota representing an upper limit for a particular type of usage of the distributed system; receiving a first request for a particular atom requiring invocation of the particular type of usage represented by the usage quota; determining that a second request for a different type of usage of the particular atom is waiting to be processed; and processing the second request for the particular atom before processing the first request.

    Ensuring globally consistent transactions

    公开(公告)号:US10042881B1

    公开(公告)日:2018-08-07

    申请号:US15358428

    申请日:2016-11-22

    Applicant: Google LLC

    Abstract: The present technology proposes techniques for ensuring globally consistent transactions. This technology may allow distributed systems to ensure the causal order of read and write transactions across different partitions of a distributed database. By assigning causally generated timestamps to the transactions based on one or more globally coherent time services, the timestamps can be used to preserve and represent the causal order of the transactions in the distributed system. In this regard, certain transactions may wait for a period of time after choosing a timestamp in order to delay the start of any second transaction that might depend on it. The wait may ensure that the effects of the first transaction are not made visible until its timestamp is guaranteed to be in the past. This may ensure that a consistent snapshot of the distributed database can be determined for any past timestamp.

    Distributing Data on Distributed Storage Systems

    公开(公告)号:US20240338279A1

    公开(公告)日:2024-10-10

    申请号:US18746351

    申请日:2024-06-18

    Applicant: Google LLC

    CPC classification number: G06F11/1435 G06F16/1748 G06F16/182 G06F16/278

    Abstract: A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or more maintenance units. Each maintenance unit has an active state and an inactive state. Moreover, each storage device is associated with a maintenance unit. The determining of the distribution of the chunks includes identifying a random selection of the storage devices matching a number of chunks of the file and being capable of maintaining accessibility of the file when one or more maintenance units are in an inactive state. The method also includes distributing the chunks to storage devices of the distributed storage system according to the determined distribution.

    Online Migration From An Eventually Consistent System To A Strongly Consistent System

    公开(公告)号:US20230325378A1

    公开(公告)日:2023-10-12

    申请号:US17716093

    申请日:2022-04-08

    Applicant: Google LLC

    CPC classification number: G06F16/2365 G06F16/2322 G06F16/273

    Abstract: Generally disclosed herein is an approach to migrate data from a first type of distributed system to a second type of distributed system without locking data, where transactional dual writes are not available across the two systems. The approach starts by setting up a bi-directional replication between the first system and the second system. The first system will initially operate as a primary system, where the primary system receives and serves write requests from clients or other devices. For each write to the first system, the second system is updated with an asynchronous write. When the second system is caught up to the first system, such that both the first and second systems reflect approximately the same data, the second system can be switched over to serve as the primary system. The second system can now directly receive and serve all future read and write requests.

    System And Method For Analyzing Data Records

    公开(公告)号:US20220171781A1

    公开(公告)日:2022-06-02

    申请号:US17673049

    申请日:2022-02-16

    Applicant: Google LLC

    Abstract: Systems and methods for analyzing input data records are provided in which a master process initiates a plurality of concurrent first processes each of which comprises, for each data record in at least a subset of a plurality of input data records, creating a parsed representation of the data record and independently applying a procedural language query to the parsed representation to extract one or more values. A respective emit operator is applied to at least one of the extracted one or more values thereby adding corresponding information to a respective intermediate data structure. The respective emit operator implements one of a predefined set of statistical information processing functions. The master process also initiates a plurality of second processes each of which aggregates information from a corresponding subset of intermediate data structures to produce aggregated data that is, in turn, combined to produce output data.

    Distributing Data on Distributed Storage Systems

    公开(公告)号:US20210382790A1

    公开(公告)日:2021-12-09

    申请号:US17445401

    申请日:2021-08-18

    Applicant: Google LLC

    Abstract: A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or more maintenance units. Each maintenance unit has an active state and an inactive state. Moreover, each storage device is associated with a maintenance unit. The determining of the distribution of the chunks includes identifying a random selection of the storage devices matching a number of chunks of the file and being capable of maintaining accessibility of the file when one or more maintenance units are in an inactive state. The method also includes distributing the chunks to storage devices of the distributed storage system according to the determined distribution.

    System and Method For Analyzing Data Records

    公开(公告)号:US20180052890A1

    公开(公告)日:2018-02-22

    申请号:US15799939

    申请日:2017-10-31

    Applicant: GOOGLE LLC

    Abstract: Systems and methods for analyzing input data records are provided in which a master process initiates a plurality of concurrent first processes each of which comprises, for each data record in at least a subset of a plurality of input data records, creating a parsed representation of the data record and independently applying a procedural language query to the parsed representation to extract one or more values. A respective emit operator is applied to at least one of the extracted one or more values thereby adding corresponding information to a respective intermediate data structure. The respective emit operator implements one of a predefined set of statistical information processing functions. The master process also initiates a plurality of second processes each of which aggregates information from a corresponding subset of intermediate data structures to produce aggregated data that is, in turn, combined to produce output data.

    Distributing data on distributed storage systems

    公开(公告)号:US12019519B2

    公开(公告)日:2024-06-25

    申请号:US18191371

    申请日:2023-03-28

    Applicant: Google LLC

    CPC classification number: G06F11/1435 G06F16/1748 G06F16/182 G06F16/278

    Abstract: A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or more maintenance units. Each maintenance unit has an active state and an inactive state. Moreover, each storage device is associated with a maintenance unit. The determining of the distribution of the chunks includes identifying a random selection of the storage devices matching a number of chunks of the file and being capable of maintaining accessibility of the file when one or more maintenance units are in an inactive state. The method also includes distributing the chunks to storage devices of the distributed storage system according to the determined distribution.

Patent Agency Ranking