System and method for efficient multi-stage querying of archived data

    公开(公告)号:US12117963B2

    公开(公告)日:2024-10-15

    申请号:US17497697

    申请日:2021-10-08

    CPC classification number: G06F16/113 G06F16/148 G06F16/172

    Abstract: A method for searching indexed packages, generating indexed packages for the records of data based on a parameter, each indexed package characterized by a package key, generating metadata for the indexed packages, the metadata comprising the package key and a reference to the packaged records of data based on a value of the parameter, storing the indexed packages and querying the records of data based on a query defining a search value of the parameter. Querying the records comprises searching the metadata based on the search value and identifying a package key for the metadata referencing the search value of the parameter, loading, from a file-based cache, an indexed package based on the identified package key, when the indexed package is stored in the cache, and loading, from the data repository which is an archive storage, the indexed package when the indexed package is not stored in the cache.

    Cloud-native global file system with constant-time rekeying

    公开(公告)号:US12081664B2

    公开(公告)日:2024-09-03

    申请号:US17335564

    申请日:2021-06-01

    Inventor: Daphne M. Shaw

    Abstract: A cloud-native global file system in which a local filer creates objects and forward them to a cloud-based object store is augmented to include constant-time rekeying (CTR). At volume creation time on the filer, a random Intermediate Key (IK) is generated. The IK is encrypted using one or more public key(s) for the volume in question, and then stored in encrypted form in a volume metadata file (e.g., cloudvolume.xml) alongside the other volume information. Once created, the IK is treated like any other volume metadata. During startup of a volume manager on the filer, the one or more per-volume IK blobs (present) are decrypted using an appropriate secret key, and then cached in memory. All objects sent to the cloud are then symmetrically encrypted to the current IK for that volume. All objects read from the cloud are decrypted using the locally-cached IK.

    Method, electronic device and computer program product for flushing metadata

    公开(公告)号:US12066937B2

    公开(公告)日:2024-08-20

    申请号:US17746367

    申请日:2022-05-17

    Abstract: Techniques for flushing metadata involve: receiving a flushing request, the flushing request instructing to flush metadata in at least one cache region to a persistent storage device; acquiring a plurality of target indicators, the target indicator at least indicating a type of a cache region and a block in the cache region, where the plurality of target indicators are classified based on types of cache regions indicated by the target indicators among the plurality of target indicators; determining, from the plurality of target indicators, at least one target indicator of the same type as the at least one cache region; and flushing metadata in a block indicated by the at least one target indicator. Such techniques avoid flushing a cache region that does not need to be flushed, shortens the response time to the flushing request, and reduces the occupancy of system resources.

    OPTIMIZING STORAGE FILE SIZE IN DISTRIBUTED DATA LAKES

    公开(公告)号:US20240248879A1

    公开(公告)日:2024-07-25

    申请号:US18159677

    申请日:2023-01-25

    Applicant: VMware, Inc.

    CPC classification number: G06F16/172 G06F16/122 G06F16/1724

    Abstract: Storage file size in distributed data lakes is optimized. At a first ingestion node of a plurality of ingestion nodes, a merge advisory is received from a coordinator. The merge advisory indicates a transaction identifier (ID). Received data associated with the transaction ID is persisted, which includes: determining whether the received data, persisted together in a single file will exceed a maximum desired file size; based on determining that the maximum desired file size will not be exceeded, persisting the received data in a single file; and based on determining that the maximum desired file size will be exceeded, persisting the received data in a plurality of files that each does not exceed the maximum desired file size. A location of the persisted received data in the permanent storage is identified, by the first ingestion node, to the coordinator.

    Lightweight filesystem for remote storage caching

    公开(公告)号:US12045199B1

    公开(公告)日:2024-07-23

    申请号:US18189909

    申请日:2023-03-24

    CPC classification number: G06F16/172 G06F16/183 G06F16/185

    Abstract: A lightweight filesystem may be provided for remote storage caching. A filesystem may maintain a persistent cache for a data set stored as data files in immutable data objects in a remote data store. Filesystem metadata may be evaluated to determine whether a portion of a data file is stored in the persistent cache according to an offset and length specified in a request. If in the persistent cache, data obtained from a data block in the persistent cache may be returned. If not in the persistent cache, then the remote data store may be accessed and the data file in the immutable data object read to obtain the portion of the data file.

    FILE CONNECTION METHOD AND APPARATUS, TERMINAL DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20240104058A1

    公开(公告)日:2024-03-28

    申请号:US18264867

    申请日:2022-01-04

    Inventor: Haoran Li

    CPC classification number: G06F16/176 G06F16/164 G06F16/172 G06F16/178

    Abstract: A file connection method and apparatus, terminal device, and storage medium are provided. The file connection method is applied to a file system. The file system includes a plurality of devices. Metadata of a target file is separately stored in metabases of the plurality of devices. The method includes: A current device determines a device that accesses the target file last time, a device in which the file is located, and a connection record based on the stored metadata of the target file, and the current device obtains file data of the target file from at least one of the device that accesses the target file last time and the device in which the file is located; and the current device displays to-be-displayed content based on at least one of the connection record of the target file and the file data of the target file.

Patent Agency Ranking