Abstract:
An in-memory graph query runtime is integrated inside a database management system and is capable of performing simple patter-matching queries against homogeneous graphs. The runtime efficiently combines breadth-first (BFS) and depth-first (DFS) neighbor traversal algorithms to achieve a hybrid runtime that takes the best from both sides. As a result, the hybrid runtime is able to process arbitrarily large queries with a fixed amount of memory, optimizing for memory locality.
Abstract:
Techniques related to prefix compression are disclosed. Each value in a set of values has a prefix component and a suffix component. The set of values is divided into at least a first plurality of values and a second plurality of values. Compression of each plurality of values involves determining a prefix component that is shared by each value of the plurality of values. Compression further involves determining a plurality of suffix components belonging to the plurality of values. The prefix component is stored separately from the plurality of suffix components but contiguously with a suffix component of the plurality of suffix components. This enables the prefix component and the suffix component to be read as a value of the plurality of values.
Abstract:
Techniques are described for materializing pre-computed results of expressions. In an embodiment, a set of one or more column units are stored in volatile or non-volatile memory. Each column unit corresponds to a column that belongs to an on-disk table within a database managed by a database server instance and includes data items from the corresponding column. A set of one or more virtual column units, and data that associates the set of one or more column units with the set of one or more virtual column units, are also stored in memory. The set of one or more virtual column units includes a particular virtual column unit storing results that are derived by evaluating an expression on at least one column of the on-disk table.
Abstract:
Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (“SIMD”) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.
Abstract:
Techniques are described herein for reducing the number of redundant evaluations that occur when an expression is evaluated against an encoded column vector by caching results of expression evaluations. When executing a query that includes an expression that references columns for which dictionary-encoded column vectors exist, the database server performs a cost-based analysis to determine which expressions (or sub-expressions) would benefit from caching the expression's evaluation result. For each such expression, the database server performs the necessary computations and caches the results for each of the possible distinct input values. When evaluating an expression for a row with a particular set of input codes, a look-up is performed based on the input code combination to retrieve the pre-computed results of that evaluation from the cache.
Abstract:
Techniques are described for materializing computations in memory. In an embodiment, responsive to a database server instance receiving a query, the database server instance identifies a set of computations for evaluation during execution of the query. Responsive to identifying the set of computations, the database server instance evaluates at least one computation in the set of computations to obtain a first set of computation results for a first computation in the set of computations. After evaluating the at least one computation, the database server instance stores, within an in-memory unit, the first set of computation results. The database server also stores mapping data that maps a set of metadata values associated with the first computation to the first set of computation results.
Abstract:
A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.
Abstract:
Techniques are provided for more efficiently using the bandwidth of the I/O path between a CPU and volatile memory during the performance of database operation. Relational data from a relational table is stored in volatile memory as column vectors, where each column vector contains values for a particular column of the table. A binary-comparable format may be used to represent each value within a column vector, regardless of the data type associated with the column. The column vectors may be compressed and/or encoded while in volatile memory, and decompressed/decoded on-the-fly within the CPU. Alternatively, the CPU may be designed to perform operations directly on the compressed and/or encoded column vector data. In addition, techniques are described that enable the CPU to perform vector processing operations on the column vector values.
Abstract:
Techniques related to automatic cache management are disclosed. In some embodiments, one or more non-transitory storage media store instructions which, when executed by one or more computing devices, cause performance of an automatic cache management method when a determination is made to store a first set of data in a cache. The method involves determining whether an amount of available space in the cache is less than a predetermined threshold. When the amount of available space in the cache is less than the predetermined threshold, a determination is made as to whether a second set of data has a lower ranking than the first set of data by at least a predetermined amount. When the second set of data has a lower ranking than the first set of data by at least the predetermined amount, the second set of data is evicted. Thereafter, the first set of data is cached.
Abstract:
A “hybrid derived cache” stores semi-structured data or unstructured text data in an in-memory mirrored form and columns in another form, such as column-major format. The hybrid derived cache may cache scalar type columns in column-major format. The structure of the in-memory mirrored form of semi-structured data or unstructured text data enables and/or enhances access to perform path-based and/or text based query operations. A hybrid derived cache improves cache containment for executing query operations. The in-memory mirrored form is used to compute queries in a transactionally consistent manner through the use of an invalid vector that used to determine when to retrieve the transactionally consistent persistent form of semi-structured data or unstructured text data in lieu of the in-memory form.