Abstract:
Application program data stored in system memory may be selectively persisted. An indication may be provided to an application program that an application data object or a range of application data stored in system memory may be treated as persistent. Data backup may be enabled for the application data object or range of application data in the event of a system failure, copying the application data object or range of application data from system memory to non-volatile data storage. Upon recovery from a system failure, further data backup for the application data object or the range of application data may be disabled. In some embodiments, at least some of the application data object or range of application data may be recovered for the application program to access. Data backup for the application data object or the range of application data may also be re-enabled.
Abstract:
A database system may receive a write request that specifies a modification to be made to a particular data record stored by the database system. A log record representing the modification to be made to the particular data record may be sent to a storage service of the database system. An indication (e.g., log record or other indication) that indicates a cached version of the particular data record stored in a read replica's cache is stale may be sent to a read replica. For a subsequent read of the particular data record received by the read replica, the read replica may request the particular data record from the storage service.
Abstract:
A database system may include a database service and a separate distributed storage service. The database service (or a database engine head node thereof) may be responsible for query parsing, optimization, and execution, transactionality, and consistency, while the storage service may be responsible for generating data pages from redo log records and for durability of those data pages. For example, in response to a write request directed to a particular data page, the database engine head node may generate a redo log record and send it, but not the data page, to a storage service node. The storage service node may store the redo log record and return a write acknowledgement to the database service prior to applying the redo log record. The server node may apply the redo log record and other redo log records to a previously stored version of the data page to create a current version.
Abstract:
A distributed data warehouse system maintains data blocks on behalf of clients, and stores primary and secondary copies of data blocks on different disks or nodes in a cluster. The data warehouse system may back up data blocks in a key-value backup storage system. In response to a query targeting a data block previously stored in the cluster, the data warehouse system may determine whether a consistent, uncorrupted copy of the data block is available in the cluster (e.g., by applying a consistency check). If not (e.g., if a disk or node failed), the data warehouse system may automatically initiate an operation to restore the data block from the backup storage system, using a unique identifier of the data block to access a backup copy. The target data may be returned in a query response prior to restoring primary and secondary copies of the data block in the cluster.
Abstract:
Extract, Transform, Load (ETL) processing may be initiated by detected events. A trigger event may be associated with an ETL process apply one or more transformations to a source data object. The trigger event may be detected for the ETL process and evaluated with respect to one or more execution conditions for the ETL process. If the execution conditions for the ETL process are satisfied, then the ETL process may be executed. At least some of the source data object may be obtained, the one or more transformations of the ETL process may be applied, and one or more transformed data objects may be stored.
Abstract:
A probabilistic data structure is generated for efficient query processing using a histogram for unsorted data in a column of a columnar database. A bucket range size is determined for multiples buckets of a histogram of a column in a columnar database table. In at least some embodiments, the histogram may be a height-balanced histogram. A probabilistic data structure is generated to indicate for which particular buckets in the histogram there is a data value stored in the data block. When an indication of a query directed to the column for select data is received, the probabilistic data structure for each of the data blocks storing data for the column may be examined to determine particular ones of the data blocks which do not need to be read in order to service the query for the select data.
Abstract:
A database system may include a database service and a separate distributed storage service. The database service (or a database engine head node thereof) may be responsible for query parsing, optimization, and execution, transactionality, and consistency, while the storage service may be responsible for generating data pages from redo log records and for durability of those data pages. For example, in response to a write request directed to a particular data page, the database engine head node may generate a redo log record and send it, but not the data page, to a storage service node. The storage service node may store the redo log record and return a write acknowledgement to the database service prior to applying the redo log record. The server node may apply the redo log record and other redo log records to a previously stored version of the data page to create a current version.
Abstract:
Self-describing data blocks of a minimum atomic write size may be stored for a data store. Data may be received for storage in a data block of a plurality of data blocks at a persistent storage device that are equivalent to a minimum atomic write size for the persistent storage device. Metadata may be generated for the data that includes an error detection code which is generated for the data and the metadata together. The data and the metadata are sent to the persistent storage to device to store together in the data block. An individual atomic write operation may write together the data and the metadata in the data block. When accessed, the error detection code is applicable to detect errors. The metadata may also be applicable to determine whether the data is stored for a currently assigned purpose or a previously assigned purpose of the data block.
Abstract:
Nodes of a database service may receive a read request to perform a read of a record stored by the database service and a transaction request to perform a transaction to the record. First and second indications of time may be associated with the read and transaction, respectively. A potential read anomaly (e.g., fuzzy read, read skew, etc.) may be detected based, at least in part, on a determination that the first indication of time is within a threshold value of the second indication of time. In response to detecting the potential read anomaly, the read may be performed after the transaction specified by the transaction request, regardless of whether the first indication of time is indicative of an earlier point in time than the second indication of time.
Abstract:
A database system may maintain a plurality of log records at a distributed storage system. Each of the plurality of log records may be associated with a respective change to a data page. A snapshot may be generated that is usable to read the data as of a state corresponding to the snapshot. Generating the snapshot may include generating metadata that is indicative of a particular log identifier of a particular one of the log records. Generating the snapshot may be performed without additional reading, copying, or writing of the data.