摘要:
A system and method for protecting queryable data. Specifically, the method is implemented in a system for targeted data delivery. The method includes collecting user information about a user and generating a user profile based on the user information. The user profile is divided into at least one part. Each part of the user profile is concealed such that each part of the user profile is only accessible using a corresponding tool controlled by a third party.
摘要:
A method for creating an embedding node. The method includes creating a first hash-based directed acyclic graph (“HDAG”) having a first node, which includes data, and creating a second HDAG having a second node that includes one or more data fields that store the first node.
摘要:
Methods and systems for targeted data delivery are described. A user profile that includes information about a user is accessed. A root hash of a hash-based directed acyclic graph (HDAG) is computed. The HDAG includes hashed values of items of information in the user profile. The root hash is used in proving that the user profile satisfies selection criteria associated with an offer to deliver data. The user is eligible to be presented with the offer of data provided the user profile satisfies the selection criteria. The data is targeted to the user based on the user profile without requiring a release of any of the information in the user profile.
摘要:
One embodiment is a data processing apparatus that has a chunk store containing specimen data chunks, a manifest store containing a plurality of manifests, each of which represents at least a part of previously processed data and includes at least one reference to at least one of the specimen data chunks, and a sparse chunk index containing information on only some specimen data chunks. Input data is processed into a plurality of input data segments. Each manifest of the first set has at least one reference to one of said specimen data chunks that corresponds to one of the input data chunks of a first input data segment. Specimen data chunks corresponding to other input data chunks of the first input data segment are identified by using the identified first set of manifests and at least one manifest identified when processing previous data.
摘要:
A method of limiting redundant storage of data comprises receiving a data stream and partitioning the data stream into a series of data chunks. At least one content hash value for a set of data chunks is generated based on data content of the set of data chunks. One or more data chunks are grouped into a segment with at least one boundary of the segment defined based on an evaluation of content hash values of data chunks. Content hash values of data chunks of the segment are compared to content hash values of data chunks of segments stored on a backup mass storage device. A pointer to a stored data chunk of an existing segment is stored on the backup mass storage device if a content hash value of a data chunk of the segment matches the content hash value of the stored data chunk.
摘要:
Deduplication of input data is performed at a first level, where the deduplication at the first level avoids storing an additional copy of at least one of the chunks in a data store. Additional deduplication of the deduplicated input data is performed, wherein the additional deduplication further reduces duplication.
摘要:
Data processing apparatus comprising: a chunk store containing specimen data chunks, a manifest store containing at least one manifest that represents at least a part of a data set and that comprises at least one reference to at least one of said specimen data chunks, a sparse chunk index containing information on only those specimen data chunks having a predetermined characteristic, the processing apparatus being operable to process input data into input data chunks and to use the sparse chunk index to identify at least one of said at least one manifest that includes at least one reference to one of said specimen data chunks that corresponds to one of said input data chunks having the predetermined characteristic.
摘要:
The disclosed embodiments relate to a system and method of committing to a data set, comprising forming a directed acyclic graph adapted to encode the data set, the directed acyclic graph having a plurality of pointers and a plurality of nodes wherein at least one node has multiple parents, the directed acyclic graph having at least one root node and a plurality of leaf nodes. Further, disclosed embodiments comprise committing to the directed acyclic graph to produce a committed-to data set and producing a plurality of proofs about the committed-to data set such that a combination of the plurality of proofs does not reveal information about which nodes have multiple parents, each proof comprising a trace from one of the plurality of nodes to at least one different node, the trace comprising the identities of the nodes and pointers traversed.
摘要:
Data objects are selectively stored across a plurality of differential data stores, where selection of the differential data stores for storing respective data objects is according to a criterion relating to compression of the data objects in each of the data stores, and where the differential data stores are stored in persistent storage media. Plural requests for accessing the differential data stores are batched, and one of the differential data stores is selected to page into temporary storage from the persistent storage media. The batched plural requests for accessing the selected differential data store that has been paged into the temporary storage are executed.
摘要:
To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.