Batch data ingestion in database systems

    公开(公告)号:US11294890B2

    公开(公告)日:2022-04-05

    申请号:US16365219

    申请日:2019-03-26

    Applicant: Snowflake Inc.

    Abstract: Systems, methods, and devices for batch ingestion of data into a table of a database. A method includes determining a notification indicating a presence of a user file received from a client account to be ingested into a database. The method includes identifying data in the user file and identifying a target table of the database to receive the data in the user file. The method includes generating an ingest task indicating the data and the target table. The method includes assigning the ingest task to an execution node of an execution platform, wherein the execution platform comprises a plurality of execution nodes operating independent of a plurality of shared storage devices collectively storing database data. The method includes registering metadata concerning the target table in a metadata store after the data has been fully committed to the target table by the execution node.

    Data ingestion using file queues
    7.
    发明授权

    公开(公告)号:US10997163B2

    公开(公告)日:2021-05-04

    申请号:US16943251

    申请日:2020-07-30

    Applicant: Snowflake Inc.

    Abstract: The subject technology at a data system, an ingest request to ingest one or more files into a table. The subject technology, after obtaining the ingest request and prior to the ingesting of the one or more files, persists the one or more files in a first file queue that corresponds to the table, the first file queue further corresponding to a client account, and the data system further comprising a second file queue that corresponds to both a second client account and a second table. The subject technology ingests, by one or more execution nodes, the one or more files into one or more micro-partitions of the table, each of the one or more micro-partitions comprising contiguous units of storage of a storage device.

    Batch data ingestion
    8.
    发明授权

    公开(公告)号:US10977245B2

    公开(公告)日:2021-04-13

    申请号:US16942421

    申请日:2020-07-29

    Applicant: Snowflake Inc.

    Abstract: The subject technology obtains, at a database system, an ingest request to ingest one or more files into a table of a database. The subject technology, after obtaining the ingest request and prior to the ingesting of the one or more files, persists the one or more files in a file queue that corresponds to the table. The subject technology assigns the one or more files to one or more execution nodes to be ingested into the table. The subject technology operates an ingest puller to poll the file queue. The subject technology ingests, by the one or more execution nodes, the one or more files into one or more micro-partitions of the table via one or more pipes.

    Load history calculation in internal stage replication

    公开(公告)号:US11983165B1

    公开(公告)日:2024-05-14

    申请号:US18128212

    申请日:2023-03-29

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/2365 G06F16/1748 G06F16/27

    Abstract: Embodiments of the present disclosure provide techniques for deduplicating files during internal stage replication using a directory table of the replicated internal stage that is modified as a cache for storing and retrieving original file-level metadata for the replicated files. An initial list of candidate files for loading from the internal stage to a table of the target deployment is prepared based on the files listed in the internal stage, and refined using a directory table lookup. If there is any inconsistency between the files registered in the directory table and the files listed in the internal stage, the target deployment will inspect the user-defined file-level metadata to obtain original file-level metadata for each file that is present in the internal stage but not in the directory table. This information may be used during deduplication to ensure that no duplicate files are loaded.

Patent Agency Ranking