Abstract:
A method, apparatus, and system for policy driven data placement and information lifecycle management in a database management system are provided. A user or database application can specify declarative policies that define the movement and transformation of stored database objects. The policies are associated with a database object and may also be inherited. A policy defines, for a database object, an archiving action to be taken, a scope, and a condition before the archiving action is triggered. Archiving actions may include compression, data movement, table clustering, and other actions to place the database object into an appropriate storage tier for a lifecycle phase of the database object. Conditions based on access statistics can be specified at the row level and may use segment or block level heatmaps. Policy evaluation occurs periodically in the background, with actions queued as tasks for a task scheduler.
Abstract:
A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.
Abstract:
A method, apparatus, and system for tracking row and object database activity into block level heatmaps is provided. Database activity including reads, writes, and creates can be tracked by a database management system at the finest possible level of granularity, or the row and object level. To efficiently record the tracked database activity, a two-part structure is described for writing the activity into heatmaps. A hierarchical in-memory component may use a dynamically allocated sparse pool of bitmap blocks. Periodically, the in-memory component is persisted to a stored representation component, sharable with multiple database instances, which may include consolidated last access times and/or a history of heatmap snapshots to reflect access over time. The heatmaps may then be externalized to database users and applications to provide and support a variety of features.
Abstract:
Techniques are described herein for scalable clustering of target resources by parameter set. In some embodiments, a plurality of parameter sets of varying length are received, where a parameter set identifies attributes of a target resource. A plurality of signature vectors are generated based on the plurality of parameter sets such that the signature vectors have equal lengths. A signature vector may map to one or more parameter sets of the plurality of parameter sets. A plurality of clusters are generated based on the similarity between signature vectors. Operations may be performed on a target resource based on one or more nodes in the plurality of clusters.
Abstract:
Techniques are described for characterizing and summarizing seasonal patterns detected within a time series. A set of time series data is analyzed to identify a plurality of instances of a season, where each instance corresponds to a respective sub-period within the season. A first set of instances from the plurality of instances are associated with a particular class of seasonal pattern. After classifying the first set of instances, a second set of instances may remain unclassified or otherwise may not be associated with the particular class of seasonal pattern. Based on the first and second set of instances, a summary may be generated that identifies one or more stretches of time that are associated with the particular class of seasonal pattern. The one or more stretches of time may span at least one sub-period corresponding to at least one instance in the second set of instances.
Abstract:
A method for accelerating queries using dynamically generated columnar data in a flash cache is provided. In an embodiment, a method comprises a storage device receiving a first request for data that is stored in the storage device in a base major format in one or more primary storage devices. The storage device comprises a cache. The base major format is any one of: a row-major format, a column-major format and a hybrid-columnar format. Based on first one or more criteria, it is determined whether to rewrite the data into rewritten data in a rewritten major format. In response to determining to rewrite the data into rewritten data in a rewritten major format, the storage device rewrites at least a portion of the data into particular rewritten data in the rewritten major format. The rewritten data is stored in the cache.
Abstract:
Techniques are described for materializing pre-computed results of expressions. In an embodiment, a set of one or more column units are stored in volatile or non-volatile memory. Each column unit corresponds to a column that belongs to an on-disk table within a database managed by a database server instance and includes data items from the corresponding column. A set of one or more virtual column units, and data that associates the set of one or more column units with the set of one or more virtual column units, are also stored in memory. The set of one or more virtual column units includes a particular virtual column unit storing results that are derived by evaluating an expression on at least one column of the on-disk table.
Abstract:
A method, apparatus, and system for automated information lifecycle management using low access patterns in a database management system are provided. A user or the database can store policy data that defines an archiving action when meeting an activity-level condition on one or more database objects. The archiving actions may include compression, data movement, and other actions to place the database object in an appropriate storage tier for a lifecycle phase of the database object. The activity-level condition may specify the database object meeting a low access pattern, optionally for a minimum time period. Various criteria including access statistics for the database object and cost characteristics of current and target compression levels or storage tiers may be considered to determine the meeting of the activity-level condition. The policies may be evaluated on an adjustable periodic basis and may utilize a task scheduler for minimal performance impact.
Abstract:
Techniques are provided for sharding objects across different compute nodes. In one embodiment, a database server instance generates, for an object, a plurality of in-memory chunks including a first in-memory chunk and a second in-memory chunk, where each in-memory chunk includes a different portion of the object. The database server instance assigns each in-memory chunk to one of a plurality of computer nodes including the first in-memory chunk to a first compute node and a second in-memory chunk to a second local memory of a second compute node. The database server instance stores an in-memory map that indicates a memory location for each in-memory chunk. The in-memory map indicates that the first in-memory chunk is located in the first local memory of the first compute node and that the second in-memory chunk is located in the second local memory of the second compute node.
Abstract:
Techniques are described for materializing computations in memory. In an embodiment, responsive to a database server instance receiving a query, the database server instance identifies a set of computations for evaluation during execution of the query. Responsive to identifying the set of computations, the database server instance evaluates at least one computation in the set of computations to obtain a first set of computation results for a first computation in the set of computations. After evaluating the at least one computation, the database server instance stores, within an in-memory unit, the first set of computation results. The database server also stores mapping data that maps a set of metadata values associated with the first computation to the first set of computation results.