Abstract:
Techniques are described herein for generating and using in-memory data structures to represent columns in data block sets. In an embodiment, a database management system (DBMS) receives a query for a target data set managed by the DBMS. The query may specify a predicate for a column of the target data set. The predicate may include a filtering value to be compared with row values of the column of the target data set. Prior to accessing data block sets storing the target data set from persistent storage, the DBMS identifies an in-memory summary that corresponds to a data block set, in an embodiment. The in-memory summary may include in-memory data structures, each representing a column stored in the data block set. The DBMS determines that a particular in-memory data structure exists in the in-memory summary that represents a portion of values of the column indicated in the predicate of the query. Based on the particular in-memory data structure, the DBMS determines whether or not the data block set can possibly contain the filtering value in the column of the target data set. Based on this determination, the DBMS skips or retrieves the data block set from the persistent storage as part of the query evaluation.
Abstract:
Techniques are described herein for distributing data from one or more partitioned tables across the volatile memories of a cluster. In memory copies of data from partitioned tables are grouped based on the data falling within the same partition criteria. These groups are used for assigning data from corresponding partitions to the same node when distributing data from partitioned tables across the volatile memories of a multi-node cluster. When a query requires a join between rows of partitioned tables, the work for the join query is divided into work granules that correspond to partition-wise join operations. Those partition-wise join operations are assigned to nodes by a query coordinator based on the partition-to-node mapping located in the node of the query coordinator.
Abstract:
A method, apparatus, and system for automatically determining an optimal database subsection is provided. A database subsection is selected to optimize certain benefits when the database subsection is translated, transferred, and cached on an alternative database system, which may utilize a different technology or database engine that provides certain performance benefits compared to the original database system. Algorithms such as multi-path greedy selection and/or dynamic programming may provide optimal or near-optimal results. A host for the alternative database server may be shared with or otherwise located in close physical proximity to improve latency for a database application or client layer. Once the database subsection analysis is completed, a report may be generated and presented to the user, and an implementation script may also be created to automatically configure a client host to function as a cache or replacement system according various cache size configurations described in the report.
Abstract:
Techniques related to query execution against an in-memory standby database are disclosed. A first database includes PF data stored on persistent storage in a persistent format. The first database is accessible to a first database server that converts the PF data to a mirror format to produce MF data that is stored within volatile memory. The first database server receives, from a second database server, one or more change records indicating one or more transactions performed against a second database. The one or more change records are applied to the PF data, and a reference timestamp is advanced from a first to a second timestamp. The first database server invalidates any MF data that is changed by a subset of the one or more transactions that committed between the first and second timestamps.
Abstract:
Techniques are provided for maintaining data persistently in one format, but making that data available to a database server in more than one format. For example, one of the formats in which the data is made available for query processing is based on the on-disk format, while another of the formats in which the data is made available for query processing is independent of the on-disk format. Data that is in the format that is independent of the disk format may be maintained exclusively in volatile memory to reduce the overhead associated with keeping the data in sync with the on-disk format copies of the data.
Abstract:
Columns of a table are stored in either row-major format or column-major format in an in-memory DBMS. For a given table, one set of columns is stored in column-major format; another set of columns for a table are stored in row-major format. This way of storing columns of a table is referred to herein as dual-major format. In addition, a row in a dual-major table is updated “in-place”, that is, updates are made directly to column-major columns without creating an interim row-major form of the column-major columns of the row. Users may submit database definition language (“DDL”) commands that declare the row-major columns and column-major columns of a table.