摘要:
The present invention is directed to a system and method for creating a non-redundant data set from a plurality of data sources. Generally, the system and method operate by creating unique hash keys corresponding to unique data files; compiling the hash keys along with seeking information for the corresponding data files; de-duplicating the hash keys; and retrieving/storing the data files corresponding to the de-duplicated hash keys. Thus, in accordance with the system and method of the present invention, a non-redundant data set can be created from a plurality of data sources. The system of the present invention can operate independently or in conjunction with any de-duplicating methods and systems. For example, a de-duplicating method and system can be used to read and obtain data from a variety of media, regardless of the application used to generate the backup media. The component parts of a file may be read from a medium, including content and metadata pertaining to a file. These pieces of content and metadata may then be stored and associated. To avoid duplication of data, pieces of content and metadata may be compared to previously stored content and metadata. Furthermore, using these same methods and systems the content and metadata of a file may be associated with a location where the file resided. A database which stores these components and allows linking between the various stored components may be particularly useful in implementing embodiments of these methods and systems.
摘要:
The present invention is directed to a system and method for creating a non-redundant data set from a plurality of data sources. Generally, the system and method operate by creating unique hash keys corresponding to unique data files; compiling the hash keys along with seeking information for the corresponding data files; de-duplicating the hash keys; and retrieving/storing the data files corresponding to the de-duplicated hash keys. Thus, in accordance with the system and method of the present invention, a non-redundant data set can be created from a plurality of data sources. The system of the present invention can operate independently or in conjunction with any de-duplicating methods and systems. For example, a de-duplicating method and system can be used to read and obtain data from a variety of media, regardless of the application used to generate the backup media. The component parts of a file may be read from a medium, including content and metadata pertaining to a file. These pieces of content and metadata may then be stored and associated. To avoid duplication of data, pieces of content and metadata may be compared to previously stored content and metadata. Furthermore, using these same methods and systems the content and metadata of a file may be associated with a location where the file resided. A database which stores these components and allows linking between the various stored components may be particularly useful in implementing embodiments of these methods and systems.
摘要:
The present invention provides a system and method for de-duplicating a large heterogeneous stock of data and collecting metadata associated with that data. Additionally, the system and method provide a means for retrieving data items based on specific criteria that can be identified in the collected metadata.