Snapshot-based data corruption detection

    公开(公告)号:US11567839B2

    公开(公告)日:2023-01-31

    申请号:US17512337

    申请日:2021-10-27

    IPC分类号: G06F11/07 G06F11/14 G06F11/00

    摘要: Embodiments described herein detect data corruption in a distributed data set system. For example, a system comprises node(s) for processing queries with respect to a distributed data set comprising a plurality of storage segments. A write transaction resulting from a query with respect to a particular storage segment is logged in a log record that describes a modification to the storage segment. A log service provides the log record to a data server managing a portion of the distributed data set in which the storage segment is included, which performs the write transaction with respect to the storage segment. For redundancy purposes, the data server has replica(s) that manage respective replicas of the portion of the distributed data set managed thereby. For backup purposes, snapshots of the replica(s) are periodically generated. To determine a data corruption, a snapshot of one replica is cross-validated with a snapshot of another replica.

    Snapshot-based data corruption detection

    公开(公告)号:US11249866B1

    公开(公告)日:2022-02-15

    申请号:US17237707

    申请日:2021-04-22

    IPC分类号: G06F11/07 G06F11/14 G06F11/00

    摘要: Embodiments described herein detect data corruption in a distributed data set system. For example, a system comprises node(s) for processing queries with respect to a distributed data set comprising a plurality of storage segments. A write transaction resulting from a query with respect to a particular storage segment is logged in a log record that describes a modification to the storage segment. A log service provides the log record to a data server managing a portion of the distributed data set in which the storage segment is included, which performs the write transaction with respect to the storage segment. For redundancy purposes, the data server has replica(s) that manage respective replicas of the portion of the distributed data set managed thereby. For backup purposes, snapshots of the replica(s) are periodically generated. To determine a data corruption, a snapshot of one replica is cross-validated with a snapshot of another replica.