SYSTEM AND METHOD FOR CREATING A DE-DUPLICATED DATA SET
    1.
    发明申请
    SYSTEM AND METHOD FOR CREATING A DE-DUPLICATED DATA SET 有权
    用于创建去重复数据集的系统和方法

    公开(公告)号:US20110178996A1

    公开(公告)日:2011-07-21

    申请号:US12970881

    申请日:2010-12-16

    IPC分类号: G06F17/30

    摘要: The present invention is directed to a system and method for creating a non-redundant data set from a plurality of data sources. Generally, the system and method operate by creating unique hash keys corresponding to unique data files; compiling the hash keys along with seeking information for the corresponding data files; de-duplicating the hash keys; and retrieving/storing the data files corresponding to the de-duplicated hash keys. Thus, in accordance with the system and method of the present invention, a non-redundant data set can be created from a plurality of data sources. The system of the present invention can operate independently or in conjunction with any de-duplicating methods and systems. For example, a de-duplicating method and system can be used to read and obtain data from a variety of media, regardless of the application used to generate the backup media. The component parts of a file may be read from a medium, including content and metadata pertaining to a file. These pieces of content and metadata may then be stored and associated. To avoid duplication of data, pieces of content and metadata may be compared to previously stored content and metadata. Furthermore, using these same methods and systems the content and metadata of a file may be associated with a location where the file resided. A database which stores these components and allows linking between the various stored components may be particularly useful in implementing embodiments of these methods and systems.

    摘要翻译: 本发明涉及一种用于从多个数据源创建非冗余数据集的系统和方法。 通常,系统和方法通过创建与唯一数据文件相对应的唯一散列键来操作; 编译散列键以及相应数据文件的查找信息; 解密哈希键; 以及检索/存储对应于去重复的散列键的数据文件。 因此,根据本发明的系统和方法,可以从多个数据源创建非冗余数据集。 本发明的系统可以独立地或结合任何去重复的方法和系统来操作。 例如,无论用于生成备份介质的应用程序如何,都可以使用去重复方法和系统来读取和从各种介质获取数据。 可以从介质读取文件的组成部分,包括与文件有关的内容和元数据。 然后可以存储和关联这些内容和元数据。 为了避免数据重复,可以将内容和元数据与之前存储的内容和元数据进行比较。 此外,使用这些相同的方法和系统,文件的内容和元数据可以与文件所在的位置相关联。 存储这些组件并允许各种存储的组件之间的链接的数据库在实现这些方法和系统的实施例时可能是特别有用的。

    System and method for creating a de-duplicated data set
    2.
    发明授权
    System and method for creating a de-duplicated data set 有权
    用于创建去重复数据集的系统和方法

    公开(公告)号:US08738668B2

    公开(公告)日:2014-05-27

    申请号:US12970881

    申请日:2010-12-16

    IPC分类号: G06F17/30

    摘要: The present invention is directed to a system and method for creating a non-redundant data set from a plurality of data sources. Generally, the system and method operate by creating unique hash keys corresponding to unique data files; compiling the hash keys along with seeking information for the corresponding data files; de-duplicating the hash keys; and retrieving/storing the data files corresponding to the de-duplicated hash keys. Thus, in accordance with the system and method of the present invention, a non-redundant data set can be created from a plurality of data sources. The system of the present invention can operate independently or in conjunction with any de-duplicating methods and systems. For example, a de-duplicating method and system can be used to read and obtain data from a variety of media, regardless of the application used to generate the backup media. The component parts of a file may be read from a medium, including content and metadata pertaining to a file. These pieces of content and metadata may then be stored and associated. To avoid duplication of data, pieces of content and metadata may be compared to previously stored content and metadata. Furthermore, using these same methods and systems the content and metadata of a file may be associated with a location where the file resided. A database which stores these components and allows linking between the various stored components may be particularly useful in implementing embodiments of these methods and systems.

    摘要翻译: 本发明涉及一种用于从多个数据源创建非冗余数据集的系统和方法。 通常,系统和方法通过创建与唯一数据文件相对应的唯一散列键来操作; 编译散列键以及相应数据文件的查找信息; 解密哈希键; 以及检索/存储对应于去重复的散列键的数据文件。 因此,根据本发明的系统和方法,可以从多个数据源创建非冗余数据集。 本发明的系统可以独立地或结合任何去重复的方法和系统来操作。 例如,无论用于生成备份介质的应用程序如何,都可以使用去重复方法和系统来读取和从各种介质获取数据。 可以从介质读取文件的组成部分,包括与文件有关的内容和元数据。 然后可以存储和关联这些内容和元数据。 为了避免数据重复,可以将内容和元数据与之前存储的内容和元数据进行比较。 此外,使用这些相同的方法和系统,文件的内容和元数据可以与文件所驻留的位置相关联。 存储这些组件并允许各种存储的组件之间的链接的数据库在实现这些方法和系统的实施例时可能是特别有用的。