Abstract:
A method for performing queries on a distributed time series data storage system is presented. The time series data storage system has a time series database that stores data blocks containing time stamped data across a plurality of computing devices. The system also includes an index database that stores an index associated with the time stamped data in each data block. The method includes the steps of sending a query, requesting indices, returning the indices, preparing a sub-query, forwarding the sub-query to an evaluator, evaluating the sub-query, performing a logical operation on each sub-query's result, receiving the sub-results at an output handler, and combining the sub-results.
Abstract:
A system for querying a federated data store includes a metadata knowledge graph describing the contents and relationships among one or more underlying data stores, an interactive user interface receiving requests from a data consumer, a predefined constrainable query (‘nodegroup’) store containing predefined constrainable queries that define data subsets of interest across one or more of the underlying data repositories, a knowledge-driven querying layer generating and executing queries against the federated data store and merging responsive results, a scalable analytic execution layer receiving the search results from the federated data store and applying machine learning/artificial intelligence techniques to analyze the results, and a user interface presenting visualizations of raw or analyzed results to the consumer. A method and a non-transitory computer-readable medium are also disclosed.
Abstract:
A system to generate and run federated queries against a plurality of data stores storing disparate data types, the system including a user interface receiving query details from a data consumer, a metadata knowledge graph containing metadata for links and relationships of the data stores, a knowledge-driven querying layer accessing the graph and selecting predefined constrainable queries from a nodegroup store and applying the metadata links/relationships to the predefined constrainable queries to assemble subqueries, a query and analysis platform providing the subqueries to some of the data stores for execution, a scalable analytic execution layer receiving and aggregating search results from the data stores into a merged search result and/or obtaining analytic results by applying machine learning and artificial intelligence techniques to the distributed data, the user interface presenting visualizations generated from the merged search results, and/or the analytic results. A system and a non-transitory computer-readable medium are also disclosed.
Abstract:
According to some embodiments, system and methods for building a model are provided, comprising a display; a memory storing processor-executable process steps; and a processor to execute the processor-executable process steps to cause the system to: present a user interface on a display, the user interface including one or more user-entry fields to build a model, user-entry fields is associated with a selection of big data or small data for use with the model; receive at least one data source in a user-entry field associated with the model; determine if data in the data source includes big data or small data; and in response to the determination of big data or small data in the data source, execute the model with data from the data source in a big data or small data execution environment. Numerous other aspects are provided.
Abstract:
According to some embodiments, system and methods for representing nodes and data flows in a network are provided, comprising providing a hierarchical taxonomy for one or more concepts; providing one or more hierarchical taxonomies for one or more boundary types, wherein one or more values from each boundary type are combined to form a definition of a boundary; and associating the at least one concept and at least one boundary with a transmission of data between a first node and a second node, wherein the transmitted data is a data flow. Numerous other aspects are provided.
Abstract:
Methods and systems for optimizing the configuration and parameters of a workflow using an evolutionary approach augmented with intelligent learning capabilities using a Big Data infrastructure. In an embodiment, a Big Data infrastructure receives workflow input parameters, an objective function, a pool of initial configuration parameters, and completion criteria from a client computer, and then runs multiple instances of a workflow based on the pool of initial configuration parameters resulting in corresponding output results. The process includes storing the workflow input parameters and the corresponding output results, modeling the relationship between changes in the workflow input parameters and the corresponding output results, determining that optimal output results have been achieved, and then transmitting the optimal output and the input-output variable relationships results to the client computer.
Abstract:
A system for storing time series data includes an ingester that prepares metadata indices associated with blocks of incoming time series data and stores the blocks of data in a time series database and the indices in a separate index database. The time series database distributes storage of the data blocks among multiple data nodes. A query layer receives queries and uses the index database to determine which data blocks are needed to process the query, and then requests only those data blocks from the time series database. Processing of the query is performed within the time series database only on those data nodes that contain relevant data, and partial results are passed to an output layer for formation into a final query result.