Statistical and neural network approach for data characterization to reduce storage space requirements

    公开(公告)号:US11609695B2

    公开(公告)日:2023-03-21

    申请号:US17009822

    申请日:2020-09-02

    Abstract: A data model is trained to determine whether data is raw, compressed, and/or encrypted. The data model may also be trained to recognize which compression algorithm was used to compress data and predict compression ratios for the data using different compression algorithms. A storage system uses the data model to independently identify raw data. The raw data is grouped based on similarity of statistical features and group members are compressed with the same compression algorithm and may be encrypted after compression with the same encryption algorithm. The data model may also be used to identify sub-optimally compressed data, which may be uncompressed and grouped for compression using a different compression algorithm.

    Deduplication using fingerprint tries

    公开(公告)号:US10963177B2

    公开(公告)日:2021-03-30

    申请号:US15966138

    申请日:2018-04-30

    Inventor: Sweetesh Singh

    Abstract: A fingerprint trie is used to store fingerprints for data portions stored on a storage system for use in implementing data deduplication on a storage system. The fingerprint trie may be used to compare fingerprint values to determine duplicate data portions, for example, in response to I/O operations. Leaf nodes of the fingerprint trie may be keyed by fingerprints, and a value of each leaf node may be a reference to the physical storage location of the data portion from which the fingerprint was generated. When an I/O operation is received, a fingerprint may be generated for each of one or more data portions included in the I/O operation. A fingerprint trie may be searched, for example by traversing multiple nodes of the trie according to pointers provided by the nodes, to determine whether there is any matching fingerprint specified in the fingerprint trie.

    DEDUPLICATION USING FINGERPRINT TRIES
    5.
    发明申请

    公开(公告)号:US20190332300A1

    公开(公告)日:2019-10-31

    申请号:US15966138

    申请日:2018-04-30

    Inventor: Sweetesh Singh

    Abstract: A fingerprint trie is used to store fingerprints for data portions stored on a storage system for use in implementing data deduplication on a storage system. The fingerprint trie may be used to compare fingerprint values to determine duplicate data portions, for example, in response to I/O operations. Leaf nodes of the fingerprint trie may be keyed by fingerprints, and a value of each leaf node may be a reference to the physical storage location of the data portion from which the fingerprint was generated. When an I/O operation is received, a fingerprint may be generated for each of one or more data portions included in the I/O operation. A fingerprint trie may be searched, for example by traversing multiple nodes of the trie according to pointers provided by the nodes, to determine whether there is any matching fingerprint specified in the fingerprint trie.

    USING MACHINE LEARNING TO SELECT COMPRESSION ALGORITHMS FOR COMPRESSING BINARY DATASETS

    公开(公告)号:US20220179829A1

    公开(公告)日:2022-06-09

    申请号:US17113237

    申请日:2020-12-07

    Abstract: A data model is trained to predict compressibility of binary data structures based on component entropy and predict relative compression efficiency for various compression algorithms based on component size. A recommendation engine in a storage system uses the data model to predict compressibility of binary data and determines whether to compress the binary data based on predicted compressibility. If the recommendation engine determines that compression of the binary data is justified, then a compression algorithm is recommended based on predicted relative compression efficiency. For example, the compression algorithm predicted to yield the greatest compression ratio or shortest compression/decompression time may be recommended.

    POWER DISRUPTION PROTECTION
    7.
    发明申请

    公开(公告)号:US20220026970A1

    公开(公告)日:2022-01-27

    申请号:US16939133

    申请日:2020-07-27

    Abstract: One or more aspects of the present disclosure relate to data protection techniques in response to power disruptions a power supply from a continuous power source for a storage device can be monitored. A power disruption event interrupting the power supply from the continuous power source can further be identified. In response to detecting an event, a storage system can be switched to a backup power supply, power consumption of one or more components of the storage device can be controlled based on information associated with each component and an amount of power available in the backup power supply. Further, one or more power interruption operations can be performed while the backup power supply includes sufficient power for performing the power interruption operations.

Patent Agency Ranking