Abstract:
Techniques and systems for reducing data stored on a block processing storage system are described. A losslessly reduced representation of a data block can include references to one or more prime data element blocks, and optionally a description of a reconstitution program which, when applied to the one or more prime data element blocks, results in the data block.
Abstract:
This disclosure relates to multidimensional search and retrieval on losslessly reduced data by organizing prime data elements using components of a structure of an input dataset so that searches can be performed on the losslessly reduced data based on the one or more components of the structure. Some embodiments can retrieve, from the data structure that organizes the prime data elements, either references to metadata for prime data elements, or metadata for prime data elements, or prime data elements in a content associative manner, based upon the value of certain fields or dimensions in an input query presented to the data structure (said data structure being called the prime data sieve). For every prime data element, the prime data sieve can retain a reverse reference to the losslessly reduced representation of each chunk that refers to the prime data element.
Abstract:
Some embodiments can factorize input data into a sequence of elements. Next, for at least one element in the sequence of elements, the embodiments can identify one or more prime data elements and determine a reconstitution program based on the element and the one or more prime data elements. The embodiments can then use the one or more prime data elements and the reconstitution program to generate a losslessly reduced representation of the element, and store the losslessly reduced representation of the element at a storage location. Next, in response to determining that a keyword is present in the element, the embodiments can (i) create a link that associates the keyword with the storage location, and (ii) store the link in the keyword index. The link can additionally associate the keyword with an offset where the keyword occurs in the element.
Abstract:
Input data can be losslessly reduced by using a data structure that organizes prime data elements based on their contents. Alternatively, the data structure can organize prime data elements based on the contents of a name that is derived from the prime data elements. Specifically, video data can be losslessly reduced by (1) using the data structure to identify a set of prime data elements, and (2) using the set of prime data elements to losslessly reduce intra-frames. The input data can be dynamically partitioned based on the memory usage of components of the data structure. Parcels can be created based on the partitions to facilitate archiving and movement of the data. The losslessly reduced data can be stored using a set of distilled files and a set of prime data element files.
Abstract:
Techniques and systems for reducing data stored on a block processing storage system are described. A losslessly reduced representation of a data block can include references to one or more prime data element blocks, and optionally a description of a reconstitution program which, when applied to the one or more prime data element blocks, results in the data block.
Abstract:
Some embodiments can factorize input data into a sequence of elements. Next, for at least one element in the sequence of elements, the embodiments can identify one or more prime data elements and determine a reconstitution program based on the element and the one or more prime data elements. The embodiments can then use the one or more prime data elements and the reconstitution program to generate a losslessly reduced representation of the element, and store the losslessly reduced representation of the element at a storage location. Next, in response to determining that a keyword is present in the element, the embodiments can (i) create a link that associates the keyword with the storage location, and (ii) store the link in the keyword index. The link can additionally associate the keyword with an offset where the keyword occurs in the element.
Abstract:
An amount of memory needed to hold prime data elements during reconstitution may be determined by examining the creation and usage of prime data elements and their spatial and temporal characteristics during data distillation.
Abstract:
Techniques and systems for reconstituting a sequence of losslessly reduced data chunks are described. Some embodiments can collect metadata while losslessly reducing a sequence of data chunks by using prime data elements to obtain the sequence of losslessly reduced data chunks, wherein the metadata includes an indicator corresponding to each prime data element that indicates whether or not the prime data element is referenced in multiple losslessly reduced data chunks, and optionally includes a memory size of each prime data element. Some embodiments can retrieve the metadata and reconstitute the sequence of losslessly reduced data chunks, wherein during reconstitution, the metadata can be used to retain only those prime data elements in memory that are referenced in multiple losslessly reduced data chunks. Some embodiments can, prior to performing reconstitution, use the metadata to optionally allocate sufficient memory to store the prime data elements that are referenced in multiple losslessly reduced data chunks.
Abstract:
Input data can be losslessly reduced by using a data structure that organizes prime data elements based on their contents. Alternatively, the data structure can organize prime data elements based on the contents of a name that is derived from the prime data elements. Specifically, video data can be losslessly reduced by (1) using the data structure to identify a set of prime data elements, and (2) using the set of prime data elements to losslessly reduce intra-frames. The input data can be dynamically partitioned based on the memory usage of components of the data structure. Parcels can be created based on the partitions to facilitate archiving and movement of the data. The losslessly reduced data can be stored using a set of distilled files and a set of prime data element files.
Abstract:
Systems and techniques for losslessly reducing input data using a distributed system comprising multiple computers that maintain portions of a data structure that organizes prime data elements based on names of the prime data elements. During operation, a first computer can determine a first name for the element, and send the element to a second computer based on the first name. The second computer can losslessly reduce the element by determining a second name for the element, and using the second name to navigate through a portion of the data structure maintained at the second computer.