Snapshot-based data corruption detection

    公开(公告)号:US11249866B1

    公开(公告)日:2022-02-15

    申请号:US17237707

    申请日:2021-04-22

    IPC分类号: G06F11/07 G06F11/14 G06F11/00

    摘要: Embodiments described herein detect data corruption in a distributed data set system. For example, a system comprises node(s) for processing queries with respect to a distributed data set comprising a plurality of storage segments. A write transaction resulting from a query with respect to a particular storage segment is logged in a log record that describes a modification to the storage segment. A log service provides the log record to a data server managing a portion of the distributed data set in which the storage segment is included, which performs the write transaction with respect to the storage segment. For redundancy purposes, the data server has replica(s) that manage respective replicas of the portion of the distributed data set managed thereby. For backup purposes, snapshots of the replica(s) are periodically generated. To determine a data corruption, a snapshot of one replica is cross-validated with a snapshot of another replica.

    VERSIONED METADATA USING VIRTUAL DATABASES

    公开(公告)号:US20220398232A1

    公开(公告)日:2022-12-15

    申请号:US17346619

    申请日:2021-06-14

    摘要: Distributed database systems including a plurality of SQL compute nodes are described herein that enable such nodes to operate with versioned metadata despite the fact that SQL is only single-version aware. The distributed database system further includes a global logical metadata server to store and manage versions of metadata, to determine which of such versions should be visible at any given point in time, and enable creation of a virtual database that includes the proper versions of metadata. In an aspect, a central transaction manager manages global transaction identifiers and their associated start times, abort times and/or commit times that enables determination of transaction and metadata version visibility for any point in time. In an aspect, the visible metadata is included in a virtual database that logically overlays a physical database and provides the correct version of metadata in lieu of the current metadata version stored in the physical database.

    Application service-level configuration of dataloss failover

    公开(公告)号:US10922202B2

    公开(公告)日:2021-02-16

    申请号:US16428164

    申请日:2019-05-31

    IPC分类号: G06F11/00 G06F11/20 G06F11/14

    摘要: Application service configuration of a timeframe for performing dataloss failover (failover that does not attempt full data replication to the secondary data store) from a primary data store to the secondary data store. A data-tier service, such as perhaps a database as a service (or DBaaS), could receive that configuration from the application service and automatically perform the dataloss failover as configured by the application service. This relieves the application service from having to manage the failover workflow while still allowing the application service to appropriately balance the timing of dataloss failover, which will depend on a very application-specific optimal balance between the negative effects of operational latency versus dataloss.

    Snapshot-based data corruption detection

    公开(公告)号:US11567839B2

    公开(公告)日:2023-01-31

    申请号:US17512337

    申请日:2021-10-27

    IPC分类号: G06F11/07 G06F11/14 G06F11/00

    摘要: Embodiments described herein detect data corruption in a distributed data set system. For example, a system comprises node(s) for processing queries with respect to a distributed data set comprising a plurality of storage segments. A write transaction resulting from a query with respect to a particular storage segment is logged in a log record that describes a modification to the storage segment. A log service provides the log record to a data server managing a portion of the distributed data set in which the storage segment is included, which performs the write transaction with respect to the storage segment. For redundancy purposes, the data server has replica(s) that manage respective replicas of the portion of the distributed data set managed thereby. For backup purposes, snapshots of the replica(s) are periodically generated. To determine a data corruption, a snapshot of one replica is cross-validated with a snapshot of another replica.

    Snapshot isolation query transactions in distributed systems

    公开(公告)号:US11625389B2

    公开(公告)日:2023-04-11

    申请号:US17207219

    申请日:2021-03-19

    摘要: Methods for snapshot isolation query transactions in distributed systems are performed by systems and devices. Distributed executions of queries are performed in a processing system according to an isolation level protocol for data management and data versioning across one or more data sets, one or more compute pools, etc., within a logical server via a single transaction manager that oversees the isolation semantics and data versioning. Read transactions of queries are performed lock-free via the isolation semantics, and instant rollbacks, point-in-time queries, single-phase commits in the distributed systems are also provided. Abort and cleanup operations are performed based on a distributed abort protocol and a determined oldest active transaction for the system in which the single transaction manager does not track read-only transactions, and client nodes do not maintain commit tables for transactions.

    Rowgroup consolidation with global delta accumulation and versioning in distributed systems

    公开(公告)号:US11567921B2

    公开(公告)日:2023-01-31

    申请号:US17358886

    申请日:2021-06-25

    摘要: Methods for rowgroup consolidation with delta accumulation and versioning in distributed systems are performed. The systems provide performant methods of row storage that enable versioned modifications of data while keeping and allowing access to older versions of the data for point-in-time transactions. The accumulation of valid rows, deletes, and modifications is maintained in blobs for rowgroups until a size threshold is reached, at which point the rows are moved into a columnar compressed form. Changes to data and associated metadata are stored locally and globally via appends, maintaining logical consistency. Metadata is stored in footers of files allowing faster access to the metadata and its associated data for transactions and instant rollback via metadata version flipping for aborted transactions, as well as lock-free reads of data.

    APPLICATION SERVICE-LEVEL CONFIGURATION OF DATALOSS FAILOVER

    公开(公告)号:US20190286536A1

    公开(公告)日:2019-09-19

    申请号:US16428164

    申请日:2019-05-31

    IPC分类号: G06F11/20 G06F11/14

    摘要: Application service configuration of a timeframe for performing dataloss failover (failover that does not attempt full data replication to the secondary data store) from a primary data store to the secondary data store. A data-tier service, such as perhaps a database as a service (or DBaaS), could receive that configuration from the application service and automatically perform the dataloss failover as configured by the application service. This relieves the application service from having to manage the failover workflow while still allowing the application service to appropriately balance the timing of dataloss failover, which will depend on a very application-specific optimal balance between the negative effects of operational latency versus dataloss.