Abstract:
An in-memory graph query runtime is integrated inside a database management system and is capable of performing simple patter-matching queries against homogeneous graphs. The runtime efficiently combines breadth-first (BFS) and depth-first (DFS) neighbor traversal algorithms to achieve a hybrid runtime that takes the best from both sides. As a result, the hybrid runtime is able to process arbitrarily large queries with a fixed amount of memory, optimizing for memory locality.
Abstract:
An in-memory graph query runtime is integrated inside a database management system and is capable of performing simple patter-matching queries against homogeneous graphs. The runtime efficiently combines breadth-first (BFS) and depth-first (DFS) neighbor traversal algorithms to achieve a hybrid runtime that takes the best from both sides. As a result, the hybrid runtime is able to process arbitrarily large queries with a fixed amount of memory, optimizing for memory locality.
Abstract:
Techniques are described for encoding join columns that belong to the same domain with a common dictionary. The tables are encoded with dictionary indexes that make the comparison operation of a join query a quick equality check of two integers and there is no need to compute any hashes during execution. Additionally, the techniques described herein minimize the bloom filter creation and evaluation cost as well because the dictionary indexes serve as hash values into the bloom filter. If the bloom filter is as large as the range of dictionary indexes, then the filter is no longer a probabilistic structure and can be used to filter rows in the probe phase with full certainty without any significant overhead.
Abstract:
Techniques are provided for storing in in-memory unit (IMU) in a lower-storage tier and copying the IMU to DRAM when needed for query processing. Techniques are also provided for copying IMUs to lower tiers of storage when evicted from the cache of higher tiers of storage. Techniques are provided for implementing functionality of IMUs within a storage system, to enable database servers to push tasks, such as filtering, to the storage system where the storage system may access IMUs within its own memory to perform the tasks. Metadata associated with a set of data may be used to indicate whether an IMU for the data should be created by the database server machine or within the storage system.
Abstract:
Techniques are described for maintaining an expression statistics store that stores and updates metadata values for query expressions based on the occurrence of those query expressions within queries. In an embodiment, a database server instance receives a database query. In response, the database server instance identifies expressions within the database queries. The database server instance then determines whether an expression statistics store includes an entry for the particular expression. Responsive to determining that the expression statistics store includes an entry for the particular expression, the database server instance updates at least one metadata value in the entry based on the occurrence of the particular expression. Responsive to determining that the expression statistics store does not include an entry for the particular expression, the database server instance adds an entry for the particular expression.
Abstract:
Techniques are provided for maintaining data persistently in one format, but making that data available to a database server in more than one format. For example, one of the formats in which the data is made available for query processing is based on the on-disk format, while another of the formats in which the data is made available for query processing is independent of the on-disk format. Data that is in the format that is independent of the disk format may be maintained exclusively in volatile memory to reduce the overhead associated with keeping the data in sync with the on-disk format copies of the data.
Abstract:
Techniques related to efficient evaluation of aggregate functions are disclosed. Computing device(s) may perform a method for aggregating results of performing a multiplication on a first column and a second column of a database table. A first vector stores a subset of values of the first column. A second vector stores a corresponding subset of values of the second column. When it is determined that the first vector has a lower cardinality than the second vector, a third vector stores at least a first distinct value and a second distinct value of the first vector. A first set of one or more values of the second vector is determined, wherein each value of the first set of one or more values corresponds to the first distinct value in the first vector. A first multiplicand is generated based on performing a summation over the first set of one or more values.
Abstract:
Techniques are provided for storing in in-memory unit (IMU) in a lower-storage tier and copying the IMU to DRAM when needed for query processing. Techniques are also provided for copying IMUs to lower tiers of storage when evicted from the cache of higher tiers of storage. Techniques are provided for implementing functionality of IMUs within a storage system, to enable database servers to push tasks, such as filtering, to the storage system where the storage system may access IMUs within its own memory to perform the tasks. Metadata associated with a set of data may be used to indicate whether an IMU for the data should be created by the database server machine or within the storage system.
Abstract:
Techniques are described for maintaining an expression statistics store that stores and updates metadata values for query expressions based on the occurrence of those query expressions within queries. In an embodiment, a database server instance receives a database query. In response, the database server instance identifies expressions within the database queries. The database server instance then determines whether an expression statistics store includes an entry for the particular expression. Responsive to determining that the expression statistics store includes an entry for the particular expression, the database server instance updates at least one metadata value in the entry based on the occurrence of the particular expression. Responsive to determining that the expression statistics store does not include an entry for the particular expression, the database server instance adds an entry for the particular expression.
Abstract:
A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.