Abstract:
Data deduplication for data storage tapes includes intercepting tape control commands for a single data storage tape. The intercepted tape control commands are modified for adding processing logic and parameters for placement of deduplicated file data on the single data storage tape. Deduplication metadata is written to a metadata portion of the single data storage tape. The deduplicated file data is written to a data portion of the single data storage tape based on the placement to increase read throughput for a deduplicated set of individual files and to reduce an average number of per-file gaps on the single data storage tape without re-duplicating deduplicated data for meeting optimization of individual file accesses.
Abstract:
A method and an apparatus for processing a query are disclosed. When the query is input, in a case in which partitions are present in a data table, a partition corresponding to the input query is selected, and in a case in which one or more partition column sets are present in the selected partition, one or more partition column sets corresponding to the input query are selected, and the query is processed for the selected partition column sets.
Abstract:
A system for managing a plurality of storage devices that are configured to store a database. The system includes an access instruction acquiring unit configured for acquiring an access instruction to access the database. The system also includes a predicting unit configured for predicting a table to be accessed in response to the acquired access instruction. The system further includes a relocation unit configured for mirroring the table predicted by the predicting unit, the mirroring between the plurality of storage devices.
Abstract:
A first query and a second query are received. The first query and the second query are evaluated and, based upon the evaluating, identifying first time series data required to fulfill the first query and second time series data required to fulfill the second query. An extent of overlap of the first time series data and the second time series data is determined. When the extent of overlap exceeds a predetermined threshold, the overlapping data is retrieved from a plurality of data storage devices in parallel, the data retrieved across all of the plurality of storage devices via a single read operation.
Abstract:
A system, method, and computer-readable medium that facilitate classification of database requests as problematic based on estimated processing characteristics of the request are provided. Estimated processing characteristics may include estimated skew including central processing unit skew and input/output operation skew, central processing unit duration per input/output operation, and estimated memory usage. The estimated processing characteristics are made on a request step basis. The request is classified as problematic responsive to determining one or more of the estimated characteristics of a request step exceed a corresponding threshold. In this manner, mechanisms for predicting bad query behavior are provided. Workload management of those requests may then be more successfully provided through workload throttles, filters, or even a more confident exception detection that correlates with the estimated bad behavior.
Abstract:
A system and method for searching a database for multiple entries in the database that contain similar data, in which some embodiments of the method include collating data on physical sites from at least one database source to form a collation of site data, assigning a unique entry identifier to each entry of the site data in the collation, performing a lexical analysis of the site data and assigning a similarity metric(s) to each entry of the site data, sorting site data into at least one group with similar lexical content based on a metric threshold difference analysis of the similarity metric(s), to thereby provide at least one group, having at least one site data entry therein, and wherein where there are two or more site data entries in the at least one group, preferably they refer to the same site or to sites having a similar physical address.
Abstract:
Techniques are provided for using an intermediate cache to provide some of the items involved in a scan operation, while other items involved in the scan operation are provided from primary storage. Techniques are also provided for determining whether to service an I/O request for an item with a copy of the item that resides in the intermediate cache based on factors such as a) an identity of the user for whom the I/O request was submitted, b) an identity of a service that submitted the I/O request, c) an indication of a consumer group to which the I/O request maps, d) whether the I/O request is associated with an offloaded filter provided by the database server to the storage system, or e) whether the intermediate cache is overloaded. Techniques are also provided for determining whether to store items in an intermediate cache in response to the items being retrieved, based on logical characteristics associated with the requests that retrieve the items.
Abstract:
An apparatus and method to to decouple large object (“LOB”) data processing from main-line data processing in a shared-nothing architecture. The method may include relocating rows in a database table from a source partition to a target partition, where each row stores a source descriptor identifying a LOB associated with the row. The source descriptors may be read, and space sufficient to store each LOB in a target repository may be allocated accordingly. Source descriptors may be extracted from the rows, and sorted according to the location of the LOBs in the source repository to provide an ordered retrieval sequence. Each LOB may be retrieved from the source repository according to the retrieval sequence, and stored in its allocated space. The source descriptor stored in each row in the target partition may then be replaced with a target descriptor to identify the location of the respective LOB in the target repository.
Abstract:
Systems and methods for reconstructing the state of a market are provided. Orders are arranged as a non-indexed collection of orders and may be stored in the cache memory of a processor. The physical locations of orders stored in the memory may correspond to the order in which they were received at a match engine. A computer device simulates the processing of orders between any time periods to reconstruct the activity state of an entity across a trading platform and one or more order books.
Abstract:
A query processing system has a query processor and a data manager. The query processor calls the data manager to carry out data access for a query including a filtering operation. The data manager accesses the data in a set of data and before returning the data, initiates a callback to the query processor to determine if the located data meets the filtering criteria. Where the data does not satisfy the filtering criteria, the data manager seeks additional data in the set of data, without having to return the first located data to the query processor.