Abstract:
To forecast data, an initial collection of data having a first length is received. In response to determining that the first length of the initial collection of data is insufficient for performing forecasting using a forecasting algorithm, an order of the initial collection of data is reversed to provide a reversed collection of data. Forecasting is applied on the reversed collection of data to estimate additional data values to combine with the initial collection of data to provide a second collection of data having a second length greater than the first length. The forecasting algorithm is applied on the second collection of data.
Abstract:
Clustering of nominal attributes using a nominal population metric enables comparisons of entities which are not easily comparable. In some embodiments, nominal population metrics are determined using a similarity matrix and a nominal population matrix using comparisons. In some embodiments, nominal population metrics are determined using a nominal population matrix using distributions. A computing device is able to determine the nominal population metrics with the appropriate hardware and applications configured for computing the nominal population metrics.
Abstract:
Disclosed herein is a method and system for integrating an enterprise's structured and unstructured data to provide users and enterprise applications with efficient and intelligent access to that data. Queries can be directed toward both an enterprise's structured and unstructured data using standardized database query formats such as SQL commands. A coprocessor can be used to hardware-accelerate data processing tasks (such as full-text searching) on unstructured data as necessary to handle a query. Furthermore, traditional relational database techniques can be used to access structured data stored by a relational database to determine which portions of the enterprise's unstructured data should be delivered to the coprocessor for hardware-accelerated data processing.
Abstract:
A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query. If additional intermediate statistics are necessary, a pool of intermediate statistics may be generated based on the queries in the workload by evaluating the benefit of a given statistic over the workload and adding intermediate statistics to the pool that provide relatively great benefit.
Abstract:
A system, method and computer storage medium is provided for computing analytics on structured data. The method for computing analytics on structured data comprises providing at least one data source, providing a statistics object for computing statistical estimates, providing software capable of performing any of the data processing methods selected from the pass, stream and merge methods and performing at least one statistical calculation on data from the data source using the statistics object to compute statistical estimates by at least one method selected from the provided data processing methods.
Abstract:
Disclosed are a system, method, and computer readable medium for collecting statistics associated with data in a database. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory as determined is allocated for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated.
Abstract:
A system for automatic statistics creation comprises a query optimizer which automatically generates statistics derived from data in a database and selects an executable procedure from a plurality of procedures that operate on data in a database using the automatically generated statistics. A counter is maintained of updates made to each statistic that has been automatically generated. If the counter breaches a threshold, the automatically generated statistic is removed from the database.
Abstract:
A method of distributed approximate query tracking relies on tracking general-purpose randomized sketch summaries of local streams at remote sites along with concise prediction models of local site behavior in order to produce highly communication-efficient and space/time-efficient solutions. A powerful approximate query tracking framework readily incorporates several complex analysis queries, including distributed join and multi-join aggregates and approximate wavelet representations, thus giving the first known low-overhead tracking solution for such queries in the distributed-streams model.
Abstract:
A method of selecting and presenting content based on context-sensitive learned user preferences is provided. The method includes providing a set of content items having descriptive terms. The method includes receiving user input for identifying items and, in response thereto, presenting a subset of items. The method includes receiving user selections of said items and analyzing the descriptive terms of those items to learn the user's content preferences. The method includes determining the context in which the user performed the selections and associating those contexts with the user content preferences learned from the corresponding user selections. The method includes, in response to subsequent user input, determining a context of said subsequent input and selecting and ordering a collection of items based on comparing those items' descriptive terms with the user's learned content preferences associated with the determined context in which the user entered the subsequent input.
Abstract:
A computer-implemented method and system is operable to: receive a tracking event from a client, recognize tracking specific parameters in the tracking event, generate a tracking entry corresponding to the tracking event, use a tracking service API to send the tracking entry to a second server, and redirect the client to an intended target corresponding to the tracking event.