Abstract:
This storage system is designed to: divide data into a plurality of chunk data (pieces of data) in a deduplication process; select one or more chunk data from among the plurality of chunk data in accordance with a sampling period which indicates that, on average, one chunk data be selected from among each N chunk data; and calculate a fingerprint, such as a hash value, for each of one or more characteristic chunk data, which are the selected one or more chunk data, and determine whether data including the one or more characteristic chunk data is a duplication. The storage system changes the sampling period on the basis of the results of past deduplication processes.
Abstract:
A storage control apparatus performs, for each virtual area to which a physical area is allocated, any one of coarse-grained management for managing a correspondence relationship between a virtual area and a physical area in a first size unit, and fine-grained management for managing a correspondence relationship between a virtual area and a physical area in a second size unit smaller than the first size unit. The storage control apparatus manages mapping information that expresses a correspondence relationship between a virtual area and a physical area. The storage control apparatus performs at least one of change of any of fine-grained virtual areas to a coarse-grained virtual area and change of any of coarse-grained virtual areas to a fine-grained virtual area, based on the number of duplication areas of each virtual area and a size of the mapping information.
Abstract:
A controller receives new data which is data updated from old data, stores the received new data in a memory, reads the old data from a first storage medium group and stores the old data read into the memory, generates transfer data which is used to replicate in the subsidiary storage system new data with less information than the new data on the basis of a difference between the old data and the new data in the memory and transmits the transfer data to the subsidiary storage system, reads the old parity and stores it in the memory, and generates new parity which is parity updated from the old parity on the basis of the old parity in the memory and XOR data which is the exclusive logical sum of the new data and old data in the memory, and stores the new parity in the first storage medium group.
Abstract:
Regarding a distributed storage system including a plurality of nodes, a first node among the plurality of nodes judges whether the same data as first data, which is written to a first virtual partial area managed by the first node from among a plurality of virtual partial areas, exists in the virtual partial area managed by another node among the plurality of nodes; when the same data as the first data exists in the other node, the first node executes inter-node deduplication for changing allocation of either one of logical partial areas for the first virtual partial area or the virtual partial area of the other node to which the same data is written, to the other logical partial area; and when I/O load on the first node after execution of the inter-node deduplication of the first virtual partial area and the predicted value is less than a first threshold, the first node executes the inter-node deduplication of a second virtual partial area managed by the first node from among the plurality of virtual partial areas obtained by dividing the virtual storage area.
Abstract:
A computer, which is configured to extract an attribute being a character string indicating a feature of a paper-based document, the computer stores template information dictionary information. The computer is configured to: execute character recognition processing on image data on the paper-based document; extract an attribute corresponding to each of the at least one type of attribute, which is defined in each of the plurality of templates, through use of a result of the character recognition processing and the plurality of templates; calculate a score regarding the extracted attribute for each of the plurality of templates; select one of the plurality of templates that has the highest extraction accuracy of the attribute based on the score; and generate output information through use of the selected template.
Abstract:
A storage system according to an aspect of the present invention includes one or more storage devices for storing write data to which a write request from a host computer is directed, and a storage controller that provides one or more volumes to the host computer. Further, the storage system manages the time when a write request is last received from the host computer for each partition within the volume. Then, the storage controller performs a deduplication process upon detecting the partition not receiving a write request for a predetermined time or more from the time when the write request is last received.
Abstract:
Availability of an information system including a storage apparatus and a host computer is improved. A host system includes a first storage apparatus provided with a first volume for storing data, and a second storage apparatus for storing the data sent from the first storage apparatus. In case of a failure occurring in the first storage apparatus, the host sends the data to be sent to the first storage apparatus to the second storage apparatus. The same identification number is used by the host computer for accessing data stored in the first volume via a first virtual volume and for accessing data stored in a second volume of the second storage system via a second virtual volume.