Systems and methods for data deduplication by generating similarity metrics using sketch computation
摘要:
A method for data reduction may comprise computing (i) a first sketch of a first segment and (ii) a second sketch of a second segment. The first sketch and the second sketch may each comprise a set of features that are representative of or unique to the corresponding first and second segments. The method also comprise processing the first sketch and the second sketch to generate a similarity metric indicative of whether the second segment is similar to the first segment. The method may further comprise (1) performing a differencing operation on the second segment relative to the first segment when the similarity metric is greater than or equal to a similarity threshold, or (2) storing the first segment and the second segment in a database without performing the differencing operation when the similarity metric is less than the similarity threshold.
信息查询
0/0