Automated information lifecycle management of indexes

    公开(公告)号:US11379410B2

    公开(公告)日:2022-07-05

    申请号:US16926425

    申请日:2020-07-10

    IPC分类号: G06F16/11 G06F16/901

    摘要: Techniques are provided for a DBMS automating ILM on indexes, based on index composition, to efficiently reduce index storage footprints. According to an embodiment, a user sets an index-specific ILM (ISILM) policy, which comprises one or both of an index-test requirement and a time requirement. Based on the ISILM policy being met, or on some other way of initiating analysis, the DBMS automatically analyzes the data blocks storing the index to determine an index condition metric (e.g., percentage of free space). This analysis is performed on a sample of data blocks storing the index without blocking the index from other operations during the analysis. The condition metric for the entire index is estimated based on analysis of the sample data blocks. Using the determined condition metric for an index, the DBMS automatically selects an option for optimally managing the index (e.g., coalesce, shrink space, index rebuild, no action, etc.).

    PARALLEL AND EFFICIENT TECHNIQUE FOR BUILDING AND MAINTAINING A MAIN MEMORY CSR BASED GRAPH INDEX IN A RDBMS

    公开(公告)号:US20210334249A1

    公开(公告)日:2021-10-28

    申请号:US17370418

    申请日:2021-07-08

    摘要: Herein are techniques that concurrently populate entries in a compressed sparse row (CSR) encoding, of a type of edge of a heterogenous graph. In an embodiment, a computer obtains a mapping of a relational schema to a graph data model. The relational schema defines vertex tables that correspond to vertex types in the graph data model, and edge tables that correspond to edge types in the graph data model. Each edge type is associated with a source vertex type and a target vertex type. For each vertex type, a sequence of persistent identifiers of vertices is obtained. Based on the mapping and for a CSR representation of each edge type, a source array is populated that, for a same vertex ordering as the sequence of persistent identifiers for the source vertex type, is based on counts of edges of the edge type that originate from vertices of the source vertex type. For the CSR, the computer populates, in parallel and based on said mapping, a destination array that contains canonical offsets as sequence positions within the sequence of persistent identifiers of the vertices.

    OZIP compression and decompression

    公开(公告)号:US10437781B2

    公开(公告)日:2019-10-08

    申请号:US15640286

    申请日:2017-06-30

    摘要: A method, apparatus, and system for OZIP, a data compression and decompression codec, is provided. OZIP utilizes a fixed size static dictionary, which may be generated from a random sampling of input data to be compressed. Compression by direct token encoding to the static dictionary streamlines the encoding and avoids expensive conditional branching, facilitating hardware implementation and high parallelism. By bounding token definition sizes and static dictionary sizes to hardware architecture constraints such as word size or processor cache size, hardware implementation can be made fast and cost effective. For example, decompression may be accelerated by using SIMD instruction processor extensions. A highly granular block mapping in optional stored metadata allows compressed data to be accessed quickly at random, bypassing the processing overhead of dynamic dictionaries. Thus, OZIP can support low latency random data access for highly random workloads, such as for OLTP systems.

    HYBRID BIT-SLICED DICTIONARY ENCODING FOR FAST INDEX-BASED OPERATIONS
    7.
    发明申请
    HYBRID BIT-SLICED DICTIONARY ENCODING FOR FAST INDEX-BASED OPERATIONS 审中-公开
    用于快速指数操作的混合位转换字典编码

    公开(公告)号:US20150277917A1

    公开(公告)日:2015-10-01

    申请号:US14242778

    申请日:2014-04-01

    IPC分类号: G06F9/38 G06F9/30

    摘要: Techniques are described herein for storing and processing codes included in dictionary-encoded data. In an embodiment, for each respective code of a plurality of codes in the dictionary-encoded data: a plurality of bits from a first portion of the respective code is contiguously stored. One or more bits from a second portion of the respective code is stored in one or more slices. Each respective slice of the one or more slices stores a bit from the one or more bits with a corresponding bit position in the respective code. In another embodiment, a bit-vector is generated based on at least one slice by loading each respective bit of the plurality of bits into different respective partitions in a register at a bit position corresponding to the at least one slice. A plurality of codes may be reconstructed by combining the bit-vector with one or more other bit-vectors

    摘要翻译: 这里描述了用于存储和处理包括在字典编码数据中的代码的技术。 在一个实施例中,对于字典编码数据中的多个代码的每个相应代码:来自相应代码的第一部分的多个位被连续地存储。 来自相应代码的第二部分的一个或多个位被存储在一个或多个片中。 一个或多个片的每个相应片存储来自一个或多个比特的比特,并在相应的代码中具有对应的比特位置。 在另一实施例中,通过将多个比特的每个相应比特加载到对应于至少一个切片的比特位置的寄存器中的不同相应分区中,基于至少一个切片生成比特向量。 可以通过将比特向量与一个或多个其他比特向量组合来重构多个代码

    OPTIMIZE WORKLOAD PERFORMANCE BY AUTOMATICALLY DISCOVERING AND IMPLEMENTING IN-MEMORY PERFORMANCE FEATURES

    公开(公告)号:US20240111772A1

    公开(公告)日:2024-04-04

    申请号:US18374852

    申请日:2023-09-29

    IPC分类号: G06F16/2455 G06F11/34

    摘要: Techniques are provided for optimizing workload performance by automatically discovering and implementing performance optimizations for in-memory units (IMUs). A system maintains a set of IMUs for processing database operations in a database. The system obtains a database workload information for the database system and filters the database workload information to identify database operations in the database workload information that may benefit from performance optimizations. The system analyzes the database operations to identify a set of performance optimizations and ranks the performance optimizations based on their potential benefit. The system selects a subset of the performance optimizations, based on their ranking, and generates new versions of IMUs that reflect the performance optimizations. The system performs verification tests on the new versions of IMUs and analyzes the tests to determine whether the new versions of IMUs yield expected performance benefits. The system then categorizes the new set of IMUs into a first set of IMUs to be retained and a second set of IMUs to be discarded. The system then makes the first set of IMUs available to the current workload and discards the second set of IMUs.