Abstract:
Techniques are provided for de-normalizing semi-structured hierarchical data into a virtual table. At least a portion of semi-structured data document collection is denormalized for improving the execution of queries that involves a traversal of the semi-structured data hierarchy of the semi-structured data document collection, in an embodiment. Based on the extracted schema of the semi-structured data, a de-normalized arrangement is generated, in which the hierarchical relationship of the semi-structured data is converted into a set of columns. The denormalized arrangement is materialized by applying the de-normalized arrangement onto the semi-structured data. The materialized arrangement, the virtual table, may be stored on a persistent storage or kept in volatile memory. The virtual table may be stored in one format on the persistent storage and in another format in the volatile memory. A received query that involves a traversal of the semi-structured data hierarchy is converted to a relational query that can be executed on the virtual table, in an embodiment. The execution of the relational query on the virtual table improves the performance in generating the resulting data set.
Abstract:
A data guide is dynamically generated. The data guide describes the structures of hierarchical data objects added to a collection of hierarchical data objects. Examples of hierarchical data objects are documents that conform to XML (Extensible Mark-up Language) or data objects that conform to JSON (JavaScript Object Notation). The data guide may be created and/or updated as hierarchical data objects are added to the collection.
Abstract:
Data can be categorized into facts, information, hypothesis, and directives. Activities that generate certain categories of data based on other categories of data through the application of knowledge which can be categorized into classifications, assessments, resolutions, and enactments. Activities can be driven by a Classification-Assessment-Resolution-Enactment (CARE) control engine. The CARE control and these categorizations can be used to enhance a multitude of systems, for example diagnostic system, such as through historical record keeping, machine learning, and automation. Such a diagnostic system can include a system that forecasts computing system failures based on the application of knowledge to system vital signs such as thread or stack segment intensity and memory heap usage. These vital signs are facts that can be classified to produce information such as memory leaks, convoy effects, or other problems. Classification can involve the automatic generation of classes, states, observations, predictions, norms, objectives, and the processing of sample intervals having irregular durations.
Abstract:
Embodiments of the invention provide systems and methods for managing and processing large amounts of complex and high-velocity data by capturing and extracting high-value data from low value data using big data and related technologies. Illustrative database systems described herein may collect and process data while extracting or generating high-value data. The high-value data may be handled by databases providing functions such as multi-temporality, provenance, flashback, and registered queries. In some examples, computing models and system may be implemented to combine knowledge and process management aspects with the near real-time data processing frameworks in a data-driven situation aware computing system.
Abstract:
A query may be rewritten to leverage information stored in a structured XML index. An operator in the query may be analyzed to determine an input source database object for the operator by traversing an operator tree rooted at the operator. The path expressions associated with the operator tree may be fused together to form an effective path expression for the operator. If the effective path expression directly matches a path expression derived from the index, the query may be rewritten using references to the index. Operators in a query that have effective paths that refer to data in the same index table may be grouped together. A single subquery may be written for a group of operators. Also, a structured XML index may be used as an implied schema for indexed XML data. This implied schema may be used to optimize queries that refer to the indexed XML data.
Abstract:
Processes, machines, and stored instructions are provided for storing posting lists for tokens in XML documents and using the posting lists to process queries. For each occurrence of a token in the XML documents, a document processor adds an entry to a list for the token. The entry for the token maps the token to documents or nodes within the documents where the tokens can be found. The document processor may also detect tags in the XML documents and, for each occurrence of a tag, add an entry to a list for the tag. The entry for the tag specifies a range of locations covered by the tag. A query processor may then receive a full text query for evaluation against XML documents, and the query processor may determine a result set for the query using the lists for the tokens and/or the lists for the tags.
Abstract:
Data structures and methods are described for converting a text format data-interchange file into size efficient binary representations. A method comprises receiving a request to convert a data-interchange file, comprising a hierarchy of nodes, into a binary file. The method further comprises generating a tree representation of the nodes that reference a plurality of leaf values. The method further comprises, in response to determining that the binary file is to be compressed, embedding relative node jump offsets when generating the tree representation. The method further comprises, in response to determining that the data-interchange file is immutable, deduplicating the plurality of leaf values in a space optimized manner. The method further comprises, in response to determining that the data-interchange file is mutable, deduplicating the plurality of leaf values in a stream optimized manner. The method further comprises storing the deduplicated plurality of leaf values in the binary file.
Abstract:
Techniques are described for applying topological graph changes and traversing the modified graph. In an implementation, a set of compile processes schedules the graph changes caused by a DML (Data Manipulation Language) statement. Based on the requested graph operation in a received query for graph, a set of graph operation processes generate extensions to the graph that capture the changes to the graph by the DML. The received graph operation(s) are then performed by traversing both the existing graph and the generated extensions.
Abstract:
Techniques support graph pattern matching queries inside a relational database management system (RDBMS) that supports SQL execution. The techniques compile a graph pattern matching query into a SQL query that can then be executed by the relational engine. As a result, techniques enable execution of graph pattern matching queries on top of the relational engine by avoiding any change in the existing SQL engine.
Abstract:
Herein is database query acceleration from dynamic discovery of whether contents of a persistent column can be stored in an accelerated representation in storage-side memory. In an embodiment, based on data type discovery, a storage server detects that column values in a persistent column have a particular data type. Based on storage-side metadata including a frequency of access of the persistent column as an offload input column for offload computation requests on a certain range of memory addresses, the storage server autonomously decides to generate and store, in storage-side memory in the storage server, an accelerated representation of the persistent column that is based on the particular data type. The storage server receives a request to perform an offload computation for the offload input column. Based on the accelerated representation of the persistent column, execution of the offload computation is accelerated.