摘要:
A method for backing up data in a plurality of computers connected via a network. The method includes forming partnerships between the plurality of computers such that each computer in a partnership commits under agreements to help backup the data of its partners. The method further includes periodically verifying that previously backed up data is being retained by the computers committed to act as backup partners in accordance with the agreements. In another embodiment, the method provides a distributed cooperative backing up of data in a system that includes a loose confederation of computers connected via a network. In this embodiment the method includes selecting computers as potential backup partners from among the loose confederation of computers connected via the network based on predetermined criteria, and negotiating a reciprocal backup partnership agreement between the computers based on predetermined requirements, including backup requirements. Once the negotiations are complete and the agreements are made, the method proceeds to form partnerships between the computers. The computers become backup partners by agreeing to cooperatively provide backup services to each other so that a distributed cooperative backing up of data can be administered in the absence of central control. The method further includes periodically backing up data at the backup partners, where the data being backed up is encoded. The method also includes periodically verifying that previously backed up data is retained by the backup partners. Another aspect of the invention is a distributed cooperative backup system that includes a network and a loose confederation of computers connected via the network. A plurality of computers from among the loose confederation of computers is configured for distributed cooperative backing up of data and for functioning as backup partners. Each computer of the plurality of computers has a storage that can be used for providing reciprocal backup services. Each computer of the plurality of computers respectively also has a computer readable medium embodying computer program code configured to cause the computer to perform functions comparable to the method steps as described above.
摘要:
A plurality of differential data stores are stored in persistent storage media. In response to receiving a first request to store a particular data object, one of the differential data stores that are stored in the persistent storage media is selected, wherein selecting the one differential data store is according to a criterion relating to compression of data objects in the differential data stores. The selected differential data store is copied into temporary storage media, where the copying is not delayed after receiving the first request to await receipt of more requests. The particular data object is inserted into the copy of the selected differential data store in the temporary storage media, where the inserting is performed without having to retrieve more data from the selected differential store in the persistent storage media. The selected differential data store in the persistent storage media is replaced with the copy of the selected differential data store in the temporary storage media that has been modified.
摘要:
Chunks are stored in a container of a data store, where the chunks are produced by dividing input data as part of a deduplication process. In response to determining that the size of the container has reached a predefined size threshold, at least one of the chunks in the container is moved to another container.
摘要:
Data processing apparatus comprising: a chunk store containing specimen data chunks, a manifest store containing a plurality of manifests, each of which represents at least a part of a data set and each of which comprises at least one reference to at least one of said specimen data chunks, a sparse chunk index containing information on only some specimen data chunks, the processor being operable to: process input data into input data chunks; identify manifests having at least one reference to one of said specimen data chunks that corresponds to one of said input data chunks and on which there is information contained in the sparse chunk index; and prioritize the identified manifests for subsequent operation.
摘要:
Information relating to monitored communications between user machines and a resource of a particular machine is received. Group information that identifies groups of the users is received. Based on the monitored communications and the group information, a summary of a subset of users that have accessed the resource is generated.
摘要:
Provided are, among other things, systems, methods and techniques for determining applicability of a policy defined by reference to a source document. A first sketch that was generated based on content of the source document is obtained, and a matching criterion is defined based on the first sketch. Also obtained is a second sketch that was generated based on content of a potential target document. A determination is made as to whether the policy applies to the potential target document based on whether the second sketch satisfies the matching criterion and, if the policy applies, a notification regarding applicability of the policy automatically is provided and/or an action automatically is blocked so as to prevent a violation of the policy.
摘要:
As part of a deduplication process, chunks are produced from data. The chunks are assigned to locations in a data store, where the assignments are such that a number of locations referenced is capped according to at least one predefined parameter.
摘要:
To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.
摘要:
For a restore request, at least a portion of a recipe that refers to chunks is read. Based on the recipe portion, a container having plural chunks is retrieved. From the recipe portion, it is identified which of the plural chunks of the container to save, where some of the chunks identified do not, at a time of the identifying, have to be presently communicated to a requester. The identified chunks are stored in a memory area from which chunks are read for the restore operation.
摘要:
Data processing apparatus comprising: a chunk store containing specimen data chunks, a manifest store containing a plurality of manifests, each of which represents at least a part of a data set and each of which comprises at least one reference to at least one of said specimen data chunks, a sparse chunk index containing information on only some specimen data chunks, the processor being operable to: process input data into input data chunks; identify manifests having at least one reference to one of said specimen data chunks that corresponds to one of said input data chunks and on which there is information contained in the sparse chunk index; and prioritize the identified manifests for subsequent operation.