Columnar data arrangement for semi-structured data

    公开(公告)号:US10191944B2

    公开(公告)日:2019-01-29

    申请号:US15078713

    申请日:2016-03-23

    Abstract: Techniques are provided for de-normalizing semi-structured hierarchical data into a virtual table. At least a portion of semi-structured data document collection is denormalized for improving the execution of queries that involves a traversal of the semi-structured data hierarchy of the semi-structured data document collection, in an embodiment. Based on the extracted schema of the semi-structured data, a de-normalized arrangement is generated, in which the hierarchical relationship of the semi-structured data is converted into a set of columns. The denormalized arrangement is materialized by applying the de-normalized arrangement onto the semi-structured data. The materialized arrangement, the virtual table, may be stored on a persistent storage or kept in volatile memory. The virtual table may be stored in one format on the persistent storage and in another format in the volatile memory. A received query that involves a traversal of the semi-structured data hierarchy is converted to a relational query that can be executed on the virtual table, in an embodiment. The execution of the relational query on the virtual table improves the performance in generating the resulting data set.

    Knowledge intensive data management system for business process and case management
    73.
    发明授权
    Knowledge intensive data management system for business process and case management 有权
    知识密集型数据管理系统,用于业务流程和案例管理

    公开(公告)号:US09330119B2

    公开(公告)日:2016-05-03

    申请号:US14109651

    申请日:2013-12-17

    Abstract: Data can be categorized into facts, information, hypothesis, and directives. Activities that generate certain categories of data based on other categories of data through the application of knowledge which can be categorized into classifications, assessments, resolutions, and enactments. Activities can be driven by a Classification-Assessment-Resolution-Enactment (CARE) control engine. The CARE control and these categorizations can be used to enhance a multitude of systems, for example diagnostic system, such as through historical record keeping, machine learning, and automation. Such a diagnostic system can include a system that forecasts computing system failures based on the application of knowledge to system vital signs such as thread or stack segment intensity and memory heap usage. These vital signs are facts that can be classified to produce information such as memory leaks, convoy effects, or other problems. Classification can involve the automatic generation of classes, states, observations, predictions, norms, objectives, and the processing of sample intervals having irregular durations.

    Abstract translation: 数据可以分为事实,信息,假设和指令。 通过应用可分类到分类,评估,决议和制定的知识,基于其他类别的数据生成某些类别的数据的活动。 活动可以通过分类评估 - 分配制度(CARE)控制引擎来驱动。 CARE控制和这些分类可用于增强大量系统,例如诊断系统,例如通过历史记录保存,机器学习和自动化。 这样的诊断系统可以包括基于将知识应用于诸如线程或堆栈段强度和内存堆使用的系统生命体征来预测计算系统故障的系统。 这些生命体征是可以分类以产生诸如记忆泄漏,车队效应或其他问题的信息的事实。 分类可以涉及自动生成类,状态,观察,预测,规范,目标以及具有不规则持续时间的采样间隔的处理。

    KNOWLEDGE-INTENSIVE DATA PROCESSING SYSTEM
    74.
    发明申请
    KNOWLEDGE-INTENSIVE DATA PROCESSING SYSTEM 审中-公开
    知识密集型数据处理系统

    公开(公告)号:US20150254330A1

    公开(公告)日:2015-09-10

    申请号:US14665171

    申请日:2015-03-23

    Abstract: Embodiments of the invention provide systems and methods for managing and processing large amounts of complex and high-velocity data by capturing and extracting high-value data from low value data using big data and related technologies. Illustrative database systems described herein may collect and process data while extracting or generating high-value data. The high-value data may be handled by databases providing functions such as multi-temporality, provenance, flashback, and registered queries. In some examples, computing models and system may be implemented to combine knowledge and process management aspects with the near real-time data processing frameworks in a data-driven situation aware computing system.

    Abstract translation: 本发明的实施例提供了通过使用大数据和相关技术从低值数据中捕获和提取高值数据来管理和处理大量复杂和高速数据的系统和方法。 本文描述的说明性数据库系统可以在提取或生成高价值数据的同时收集和处理数据。 高价值数据可以由提供多时间,来源,闪回和注册查询等功能的数据库来处理。 在一些示例中,可以实现计算模型和系统以将知识和过程管理方面与数据驱动情境感知计算系统中的近实时数据处理框架相结合。

    Leveraging Structured XML Index Data For Evaluating Database Queries
    75.
    发明申请
    Leveraging Structured XML Index Data For Evaluating Database Queries 审中-公开
    利用结构化XML索引数据来评估数据库查询

    公开(公告)号:US20150039642A1

    公开(公告)日:2015-02-05

    申请号:US14513176

    申请日:2014-10-13

    CPC classification number: G06F17/30932 G06F17/30935

    Abstract: A query may be rewritten to leverage information stored in a structured XML index. An operator in the query may be analyzed to determine an input source database object for the operator by traversing an operator tree rooted at the operator. The path expressions associated with the operator tree may be fused together to form an effective path expression for the operator. If the effective path expression directly matches a path expression derived from the index, the query may be rewritten using references to the index. Operators in a query that have effective paths that refer to data in the same index table may be grouped together. A single subquery may be written for a group of operators. Also, a structured XML index may be used as an implied schema for indexed XML data. This implied schema may be used to optimize queries that refer to the indexed XML data.

    Abstract translation: 可以重写查询以利用存储在结构化XML索引中的信息。 可以分析查询中的运算符以通过遍历运算符的操作符树来确定运算符的输入源数据库对象。 与运算符树相关联的路径表达式可以被融合在一起以形成操作者的有效路径表达式。 如果有效路径表达式直接匹配从索引导出的路径表达式,则可以使用对索引的引用来重写查询。 具有指向相同索引表中的数据的有效路径的查询中的运算符可以被分组在一起。 可以为一组运营商编写单个子查询。 此外,结构化XML索引可以用作索引XML数据的隐含模式。 此隐含模式可用于优化引用索引XML数据的查询。

    Evaluating XML Full Text Search
    76.
    发明申请
    Evaluating XML Full Text Search 审中-公开
    评估XML全文检索

    公开(公告)号:US20140095519A1

    公开(公告)日:2014-04-03

    申请号:US13783141

    申请日:2013-03-01

    CPC classification number: G06F16/8373

    Abstract: Processes, machines, and stored instructions are provided for storing posting lists for tokens in XML documents and using the posting lists to process queries. For each occurrence of a token in the XML documents, a document processor adds an entry to a list for the token. The entry for the token maps the token to documents or nodes within the documents where the tokens can be found. The document processor may also detect tags in the XML documents and, for each occurrence of a tag, add an entry to a list for the tag. The entry for the tag specifies a range of locations covered by the tag. A query processor may then receive a full text query for evaluation against XML documents, and the query processor may determine a result set for the query using the lists for the tokens and/or the lists for the tags.

    Abstract translation: 提供过程,机器和存储的指令用于存储XML文档中的令牌的发布列表,并使用发布列表来处理查询。 对于XML文档中每个出现的令牌,文档处理器将一个条目添加到令牌的列表中。 令牌的条目将令牌映射到可以找到令牌的文档中的文档或节点。 文档处理器还可以检测XML文档中的标签,并且对于标签的每次出现,向标签的列表中添加条目。 标签的条目指定标记覆盖的位置范围。 查询处理器然后可以接收用于针对XML文档的评估的全文查询,并且查询处理器可以使用标记的列表和/或标签的列表来确定查询的结果集。

    Technique of reducing size of binary JSON with/without compression

    公开(公告)号:US12298940B1

    公开(公告)日:2025-05-13

    申请号:US18622660

    申请日:2024-03-29

    Abstract: Data structures and methods are described for converting a text format data-interchange file into size efficient binary representations. A method comprises receiving a request to convert a data-interchange file, comprising a hierarchy of nodes, into a binary file. The method further comprises generating a tree representation of the nodes that reference a plurality of leaf values. The method further comprises, in response to determining that the binary file is to be compressed, embedding relative node jump offsets when generating the tree representation. The method further comprises, in response to determining that the data-interchange file is immutable, deduplicating the plurality of leaf values in a space optimized manner. The method further comprises, in response to determining that the data-interchange file is mutable, deduplicating the plurality of leaf values in a stream optimized manner. The method further comprises storing the deduplicated plurality of leaf values in the binary file.

Patent Agency Ranking