摘要:
Described are improved systems, computer program products, and methods for adaptively provisioning an ordered sequence in a clustered database environment. The approach includes identifying a cached list of sequence numbers. A request for one or more sequence numbers in a database environment may be received. A determination may be made to decide whether the request pertains to an ordered sequence. The one or more sequence numbers may be adaptively provisioned to fulfill the request based at least in part upon an independent cache implementation or upon a shared cache implementation.
摘要:
Techniques herein use in-memory column vectors to process data that is external to a database management system (DBMS) and logically join the external data with data that is native to the DBMS. In an embodiment, a computer maintains a data dictionary for native data that is durably stored in an DBMS and external data that is not durably stored in the DBMS. From a client through a connection to the DBMS, the computer receives a query. The computer loads the external data into an in-memory column vector that resides in random access memory of the DBMS. Based on the query and the data dictionary, the DBMS executes a data join of the in-memory column vector with the native data. To the client through said connection, the computer returns results of the query based on the data join.
摘要:
Herein are techniques for dynamic aggregation of results of a database request, including concurrent grouping of result items in memory based on quasi-dense keys. Each of many computational threads concurrently performs as follows. A hash code is calculated that represents a particular natural grouping key (NGK) for an aggregate result of a database request. Based on the hash code, the thread detects that a set of distinct NGKs that are already stored in the aggregate result does not contain the particular NGK. A distinct dense grouping key for the particular NGK is statefully generated. The dense grouping key is bound to the particular NGK. Based on said binding, the particular NGK is added to the set of distinct NGKs in the aggregate result.
摘要:
Techniques are provided for a DBMS automating ILM on indexes, based on index composition, to efficiently reduce index storage footprints. According to an embodiment, a user sets an index-specific ILM (ISILM) policy, which comprises one or both of an index-test requirement and a time requirement. Based on the ISILM policy being met, or on some other way of initiating analysis, the DBMS automatically analyzes the data blocks storing the index to determine an index condition metric (e.g., percentage of free space). This analysis is performed on a sample of data blocks storing the index without blocking the index from other operations during the analysis. The condition metric for the entire index is estimated based on analysis of the sample data blocks. Using the determined condition metric for an index, the DBMS automatically selects an option for optimally managing the index (e.g., coalesce, shrink space, index rebuild, no action, etc.).
摘要:
Herein are techniques that concurrently populate entries in a compressed sparse row (CSR) encoding, of a type of edge of a heterogenous graph. In an embodiment, a computer obtains a mapping of a relational schema to a graph data model. The relational schema defines vertex tables that correspond to vertex types in the graph data model, and edge tables that correspond to edge types in the graph data model. Each edge type is associated with a source vertex type and a target vertex type. For each vertex type, a sequence of persistent identifiers of vertices is obtained. Based on the mapping and for a CSR representation of each edge type, a source array is populated that, for a same vertex ordering as the sequence of persistent identifiers for the source vertex type, is based on counts of edges of the edge type that originate from vertices of the source vertex type. For the CSR, the computer populates, in parallel and based on said mapping, a destination array that contains canonical offsets as sequence positions within the sequence of persistent identifiers of the vertices.
摘要:
A method, apparatus, and system for OZIP, a data compression and decompression codec, is provided. OZIP utilizes a fixed size static dictionary, which may be generated from a random sampling of input data to be compressed. Compression by direct token encoding to the static dictionary streamlines the encoding and avoids expensive conditional branching, facilitating hardware implementation and high parallelism. By bounding token definition sizes and static dictionary sizes to hardware architecture constraints such as word size or processor cache size, hardware implementation can be made fast and cost effective. For example, decompression may be accelerated by using SIMD instruction processor extensions. A highly granular block mapping in optional stored metadata allows compressed data to be accessed quickly at random, bypassing the processing overhead of dynamic dictionaries. Thus, OZIP can support low latency random data access for highly random workloads, such as for OLTP systems.
摘要:
Techniques are described herein for storing and processing codes included in dictionary-encoded data. In an embodiment, for each respective code of a plurality of codes in the dictionary-encoded data: a plurality of bits from a first portion of the respective code is contiguously stored. One or more bits from a second portion of the respective code is stored in one or more slices. Each respective slice of the one or more slices stores a bit from the one or more bits with a corresponding bit position in the respective code. In another embodiment, a bit-vector is generated based on at least one slice by loading each respective bit of the plurality of bits into different respective partitions in a register at a bit position corresponding to the at least one slice. A plurality of codes may be reconstructed by combining the bit-vector with one or more other bit-vectors
摘要:
Techniques are provided for optimizing workload performance by automatically discovering and implementing performance optimizations for in-memory units (IMUs). A system maintains a set of IMUs for processing database operations in a database. The system obtains a database workload information for the database system and filters the database workload information to identify database operations in the database workload information that may benefit from performance optimizations. The system analyzes the database operations to identify a set of performance optimizations and ranks the performance optimizations based on their potential benefit. The system selects a subset of the performance optimizations, based on their ranking, and generates new versions of IMUs that reflect the performance optimizations. The system performs verification tests on the new versions of IMUs and analyzes the tests to determine whether the new versions of IMUs yield expected performance benefits. The system then categorizes the new set of IMUs into a first set of IMUs to be retained and a second set of IMUs to be discarded. The system then makes the first set of IMUs available to the current workload and discards the second set of IMUs.
摘要:
The present invention relates to join acceleration. In an embodiment, a computer receives a request for a relational join of build data rows with probe data rows. Based on the request for the relational join, a particular kind of data map from many kinds of data map that can implement the relational join is dynamically selected. Based on the build data rows, an instance of the particular kind of data map is populated. A response is sent for the request for the relational join that is based on the probe data rows and the instance of the particular kind of data map.
摘要:
Herein are techniques that concurrently populate entries in a compressed sparse row (CSR) encoding, of a type of edge of a heterogenous graph. In an embodiment, a computer obtains a mapping of a relational schema to a graph data model. The relational schema defines vertex tables that correspond to vertex types in the graph data model, and edge tables that correspond to edge types in the graph data model. Each edge type is associated with a source vertex type and a target vertex type. For each vertex type, a sequence of persistent identifiers of vertices is obtained. Based on the mapping and for a CSR representation of each edge type, a source array is populated that, for a same vertex ordering as the sequence of persistent identifiers for the source vertex type, is based on counts of edges of the edge type that originate from vertices of the source vertex type. For the CSR, the computer populates, in parallel and based on said mapping, a destination array that contains canonical offsets as sequence positions within the sequence of persistent identifiers of the vertices.