Abstract:
A method and system, the system including a plurality of machines each having a processor and a main memory component; a shared distributed storage facility storing a set of data and accessible by the plurality of machines over a communication network; a controller to select, in response to a state of a query execution plan comprising a plurality of executable jobs for the set of data, which one of a set of scheduling algorithms to execute; an execution engine to execute the selected scheduling algorithm to determine, for each job in the plurality of jobs, which server to schedule to execute the respective job; and providing an indication of the scheduling of the servers determined to be schedules for the execution of the jobs.
Abstract:
Cracking page-loadable columns for in-memory data management is described herein. An embodiment operates by accessing a column according to a received query, determining that the received query requires a non-critical data structure associated with the column, and rebuilding the non-critical data structure from data associated with the column.
Abstract:
Disclosed herein are system, method, and computer program product embodiments for accessing and managing storage class memory (SCM) enabled main-memory database structures. An embodiment operates by traversing a first node to find a location of a second node corresponding to a search key, calculating a hash value for the search key, comparing the calculated hash value with at least one fingerprint value stored in the second node, wherein the fingerprint value is determined by hashing a stored key, accessing at least one key-value pair having a matching hash value, and returning a value associated with the matching key-value pair, wherein at least one of the traversing, calculating, comparing, accessing, and returning are performed by one or more computers.
Abstract:
Disclosed herein are system, method, and computer program product embodiments for using a data statistic as a dynamic data integrity constraint. An embodiment operates by defining a data statistic for a column or a set of columns of a partition of a plurality of partitions of a database table. The embodiment creates a constraint data statistics object based on the data statistic. The embodiment receives a query for the database table. The embodiment determines the constraint data statistics object is consistent with a data state of the partition. The embodiment derives an implied constraint based on the constraint data statistics object. The embodiment processes the query for the partition based on the implied constraint.
Abstract:
Disclosed herein are system, method, and computer program product embodiments for generating join histograms. An embodiment operates by a computer implemented method that includes determining, by at least one processor, a first interval associated with a first histogram of a first data structure and a first estimate frequency associated with the first interval. The method further includes determining, by the at least one processor, a second interval associated with a second histogram of a second data structure and a second estimate frequency associated with the second interval. The method further includes determining, by the at least one processor, a join interval based on the first and second intervals by calculating an intersection of the first and second intervals. The method further includes calculating, by the at least one processor, a join estimate frequency based on the first and second estimate frequencies.
Abstract:
Disclosed herein are system, method, and computer program product embodiments for accessing and managing storage class memory (SCM) enabled main-memory database structures. An embodiment operates by traversing a first node to find a location of a second node corresponding to a search key, calculating a hash value for the search key, comparing the calculated hash value with at least one fingerprint value stored in the second node, wherein the fingerprint value is determined by hashing a stored key, accessing at least one key-value pair having a matching hash value, and returning a value associated with the matching key-value pair, wherein at least one of the traversing, calculating, comparing, accessing, and returning are performed by one or more computers.
Abstract:
A method and system, the system including a plurality of machines each having a processor and a main memory component; a shared distributed storage facility storing a set of data and accessible by the plurality of machines over a communication network; a controller to select, in response to a state of a query execution plan comprising a plurality of executable jobs for the set of data, which one of a set of scheduling algorithms to execute; an execution engine to execute the selected scheduling algorithm to determine, for each job in the plurality of jobs, which server to schedule to execute the respective job; and providing an indication of the scheduling of the servers determined to be schedules for the execution of the jobs.
Abstract:
Disclosed herein are system, method, and computer program product embodiments for using a data statistic as a dynamic data integrity constraint. An embodiment operates by defining a data statistic for a column or a set of columns of a partition of a plurality of partitions of a database table. The embodiment creates a constraint data statistics object based on the data statistic. The embodiment receives a query for the database table. The embodiment determines the constraint data statistics object is consistent with a data state of the partition. The embodiment derives an implied constraint based on the constraint data statistics object. The embodiment processes the query for the partition based on the implied constraint.
Abstract:
The present disclosure involves systems, software, and computer implemented methods for optimizing continuous queries for hybrid execution over a stream processing engine and an in-memory database. In one example, a method may include identifying a continuous query executed at a stream processing engine, the continuous query including a plurality of operators. An optimal plan for execution of the identified continuous query at the stream processing engine is determined. For each of the plurality of operators in the determined optimal plan, an optimized placement decision for executing a particular operator in the stream processing engine or at a database system is determined. An optimized continuous query is generated from the identified continuous query based on the determined optimal placement decisions for each of the plurality of operators in the determined optimal plan. The optimized continuous query is then executed at the stream processing engine and the database system.
Abstract:
The present disclosure involves systems, software, and computer implemented methods for optimizing continuous queries for hybrid execution over a stream processing engine and an in-memory database. In one example, a method may include identifying a continuous query executed at a stream processing engine, the continuous query including a plurality of operators. An optimal plan for execution of the identified continuous query at the stream processing engine is determined. For each of the plurality of operators in the determined optimal plan, an optimized placement decision for executing a particular operator in the stream processing engine or at a database system is determined. An optimized continuous query is generated from the identified continuous query based on the determined optimal placement decisions for each of the plurality of operators in the determined optimal plan. The optimized continuous query is then executed at the stream processing engine and the database system.