SYSTEMS AND METHODS FOR DATA SEGMENT PROCESSING

    公开(公告)号:US20210191640A1

    公开(公告)日:2021-06-24

    申请号:US16718703

    申请日:2019-12-18

    申请人: Ndata, Inc.

    IPC分类号: G06F3/06 G06F21/62

    摘要: A method for data processing may comprise: (a) receiving one or more input data streams from one or more client applications; (b) generating at least a first segment and a second segment from the one or more input data streams, wherein the first segment may comprise a first set of chunks and the second segment may comprise a second set of chunks; (c) computing (i) a first set of fingerprints of the first plurality of chunks and (ii) a second set of fingerprints of the second plurality of chunks; (d) processing the first set of fingerprints and the second set of fingerprints to determine that the first set of chunks and the second set of chunks meet a similarity threshold; and (e) processing the first set of chunks and the second set of chunks to determine one or more differences between the first segment and the second segment.

    SYSTEMS AND METHODS FOR SKETCH COMPUTATION

    公开(公告)号:US20220156233A1

    公开(公告)日:2022-05-19

    申请号:US17387895

    申请日:2021-07-28

    申请人: Ndata, Inc.

    摘要: A method for sketch computation is provided. The method may comprise receiving an input data stream from one or more client applications. The method may also comprise generating at least one segment from the input data stream. At least one segment may comprise a plurality of chunks. The method may further comprise computing a sketch of the at least one segment. The sketch may comprise a set of features that are representative of or unique to the at least one segment, such that the set of features corresponds to the at least one segment. The sketch may be useable for inline deduplication of at least one other input data stream received from the one or more client applications without (i) generation of a full index of the plurality of chunks or (ii) comparison of the at least one other input data stream to the full index.

    SYSTEMS AND METHODS FOR DATA DEDUPLICATION BY GENERATING SIMILARITY METRICS USING SKETCH COMPUTATION

    公开(公告)号:US20240022648A1

    公开(公告)日:2024-01-18

    申请号:US18180441

    申请日:2023-03-08

    申请人: Ndata, Inc.

    IPC分类号: H04L69/04 G06F16/174

    摘要: A method for data reduction may comprise computing (i) a first sketch of a first segment and (ii) a second sketch of a second segment. The first sketch and the second sketch may each comprise a set of features that are representative of or unique to the corresponding first and second segments. The method also comprise processing the first sketch and the second sketch to generate a similarity metric indicative of whether the second segment is similar to the first segment. The method may further comprise (1) performing a differencing operation on the second segment relative to the first segment when the similarity metric is greater than or equal to a similarity threshold, or (2) storing the first segment and the second segment in a database without performing the differencing operation when the similarity metric is less than the similarity threshold.

    Systems and methods for data deduplication by generating similarity metrics using sketch computation

    公开(公告)号:US11627207B2

    公开(公告)日:2023-04-11

    申请号:US17162687

    申请日:2021-01-29

    申请人: Ndata, Inc.

    IPC分类号: H04L69/04 G06F16/174

    摘要: A method for data reduction may comprise computing (i) a first sketch of a first segment and (ii) a second sketch of a second segment. The first sketch and the second sketch may each comprise a set of features that are representative of or unique to the corresponding first and second segments. The method also comprise processing the first sketch and the second sketch to generate a similarity metric indicative of whether the second segment is similar to the first segment. The method may further comprise (1) performing a differencing operation on the second segment relative to the first segment when the similarity metric is greater than or equal to a similarity threshold, or (2) storing the first segment and the second segment in a database without performing the differencing operation when the similarity metric is less than the similarity threshold.

    SYSTEMS AND METHODS FOR DATA DEDUPLICATION BY GENERATING SIMILARITY METRICS USING SKETCH COMPUTATION

    公开(公告)号:US20210360088A1

    公开(公告)日:2021-11-18

    申请号:US17162687

    申请日:2021-01-29

    申请人: Ndata, Inc.

    IPC分类号: H04L29/06 G06F16/174

    摘要: A method for data reduction may comprise computing (i) a first sketch of a first segment and (ii) a second sketch of a second segment. The first sketch and the second sketch may each comprise a set of features that are representative of or unique to the corresponding first and second segments. The method also comprise processing the first sketch and the second sketch to generate a similarity metric indicative of whether the second segment is similar to the first segment. The method may further comprise (1) performing a differencing operation on the second segment relative to the first segment when the similarity metric is greater than or equal to a similarity threshold, or (2) storing the first segment and the second segment in a database without performing the differencing operation when the similarity metric is less than the similarity threshold.

    Systems and methods for sketch computation

    公开(公告)号:US11119995B2

    公开(公告)日:2021-09-14

    申请号:US16718686

    申请日:2019-12-18

    申请人: Ndata, Inc.

    摘要: A method for sketch computation is provided. The method may comprise receiving an input data stream from one or more client applications. The method may also comprise generating at least one segment from the input data stream. At least one segment may comprise a plurality of chunks. The method may further comprise computing a sketch of the at least one segment. The sketch may comprise a set of features that are representative of or unique to the at least one segment, such that the set of features corresponds to the at least one segment. The sketch may be useable for inline deduplication of at least one other input data stream received from the one or more client applications without (i) generation of a full index of the plurality of chunks or (ii) comparison of the at least one other input data stream to the full index.

    SYSTEMS AND METHODS FOR SKETCH COMPUTATION

    公开(公告)号:US20210191911A1

    公开(公告)日:2021-06-24

    申请号:US16718686

    申请日:2019-12-18

    申请人: Ndata, Inc.

    摘要: A method for sketch computation is provided. The method may comprise receiving an input data stream from one or more client applications. The method may also comprise generating at least one segment from the input data stream. At least one segment may comprise a plurality of chunks. The method may further comprise computing a sketch of the at least one segment. The sketch may comprise a set of features that are representative of or unique to the at least one segment, such that the set of features corresponds to the at least one segment. The sketch may be useable for inline deduplication of at least one other input data stream received from the one or more client applications without (i) generation of a full index of the plurality of chunks or (ii) comparison of the at least one other input data stream to the full index.

    Systems and methods for data deduplication by generating similarity metrics using sketch computation

    公开(公告)号:US10938961B1

    公开(公告)日:2021-03-02

    申请号:US16718714

    申请日:2019-12-18

    申请人: Ndata, Inc.

    IPC分类号: H04L29/06 G06F16/174

    摘要: A method for data reduction may comprise computing (i) a first sketch of a first segment and (ii) a second sketch of a second segment. The first sketch and the second sketch may each comprise a set of features that are representative of or unique to the corresponding first and second segments. The method also comprise processing the first sketch and the second sketch to generate a similarity metric indicative of whether the second segment is similar to the first segment. The method may further comprise (1) performing a differencing operation on the second segment relative to the first segment when the similarity metric is greater than or equal to a similarity threshold, or (2) storing the first segment and the second segment in a database without performing the differencing operation when the similarity metric is less than the similarity threshold.