Abstract:
Word phrases are stored in a phrase structure. Each word is stored as a keyword in a keyword structure. Each keyword is associated with usage attributes identifying use of a word in a word phrase. Any preceding words associated with a keyword, and a mapping from any preceding words to a word phrase, is stored for each word. A word string is input. Match attributes are updated in a match structure if a word in the word string matches any keyword and if any preceding words associated with any matching keyword includes a preceding word which precedes the word in the word string. The match attributes indicate use of the matching word in the word string and in a word phrase. Whether a word phrase is present in the word string is determined based on the usage attributes and the match attributes associated with multiple matching words.
Abstract:
A system and method for matching phrases having arbitrary text. A first data structure stores a list of common phrases having multiple words. Each unique word is indexed in a hash table and mapped to one or more values that describe attributes of using the word in one or more of the common phrases. Using the hash table and the list of common phrases, a temporary array is defined to keep track of possible matches between words in an input string and the list of common phrases.
Abstract:
The technology disclosed relates to automatic generation of tuples from a record set for outlier analysis. Applying this new technology, user need not specify which 1-tuples to combine into n-tuples. The tuples are generated from structured records organized into features (that also could be fields, objects or attributes.) Tuples are generated from combinations of feature values in the records. Thresholding is applied to manage the number of tuples generated. The technology disclosed further relates to indexing and searching high dimensional tuple spaces in a computer-implemented system.
Abstract:
A system and method for matching phrases having arbitrary text. A first data structure stores a list of common phrases having multiple words. Each unique word is indexed in a hash table and mapped to one or more values that describe attributes of using the word in one or more of the common phrases. Using the hash table and the list of common phrases, a temporary array is defined to keep track of possible matches between words in an input string and the list of common phrases.
Abstract:
A system creates a graph of nodes connected by arcs, and identifies a first compound attribute associated with contacts purchased by a current user. The first compound attribute includes a first attribute associated with a first value and a second attribute associated with a second value. The system identifies a directed arc from a first node to a second node. The directed arc is associated with a probability that previous users who purchased a first contact associated with the first compound attribute also purchased a second contact associated with a second compound attribute. The second compound attribute includes the first attribute, associated with a third value which matches the first value, and the second attribute, associated with a fourth value, which lacks a match with the second value. The system outputs a recommendation for the current user to purchase contacts associated with the second compound attribute if the probability exceeds a threshold.
Abstract:
Methods and systems are provided for evaluating standing queries against updated contact entries configured as a stream of facts. The method includes resolving the standing queries into an array of rules, each rule having a first and a second condition; sorting one of the facts into a first property and a second property; comparing the first property of the fact to the first condition of each rule in the array of rules to produce a first subset of matching rules; comparing the second property of the fact to the second condition of each rule in the first subset of rules to produce a second subset of matching rules; and reporting at least one of the second subset of rules to an author of the matching rule. The method further includes populating a first hash with indicia of the first subset, and populating a second hash with the second subset.
Abstract:
The technology disclosed relates to improving parallel functional processing using abstractions and methods defined based on category theory. In particular, the technology disclosed provides a range of useful categorical functions for processing large data sets in parallel. These categorical functions manage all phases of distributed computing, including dividing a data set into subsets of approximately equal size and combining the results of the subset calculations into a final result, while hiding many of the low-level programming details. These categorical functions are extraordinarily well-ordered and have a sophisticated type system and type inference, which allows for generating maps and reducing them in an elegant and succinct way using concise and expressive programs that can significantly efficientize a whole software development process.
Abstract:
The technology disclosed relates to improving parallel functional processing using abstractions and methods defined based on category theory. In particular, the technology disclosed provides a range of useful categorical functions for processing large data sets in parallel. These categorical functions manage all phases of distributed computing, including dividing a data set into subsets of approximately equal size and combining the results of the subset calculations into a final result, while hiding many of the low-level programming details. These categorical functions are extraordinarily well-ordered and have a sophisticated type system and type inference, which allows for generating maps and reducing them in an elegant and succinct way using concise and expressive programs that can significantly efficientize a whole software development process.
Abstract:
The technology disclosed relates to methods for partitioning sets of features for a Bayesian classifier, finding a data partition that makes the classification process faster and more accurate, while discovering and taking into account feature dependence among sets of features in the data set. It relates to computing class entropy scores for a class label across all tuples that share the feature-subset and arranging the tuples in order of non-decreasing entropy scores for the class label, and constructing a data partition that offers the highest improvement in predictive accuracy for the data set. Also disclosed is a method for partitioning a complete set of records of features in a batch computation, computing increasing predictive power; and also relates to starting with singleton partitions, and using an iterative process to construct a data partition that offers the highest improvement in predictive accuracy for the data set.
Abstract:
A system creates a graph of nodes connected by arcs, and identifies a first compound attribute associated with contacts purchased by a current user. The first compound attribute includes a first attribute associated with a first value and a second attribute associated with a second value. The system identifies a directed arc from a first node to a second node. The directed arc is associated with a probability that previous users who purchased a first contact associated with the first compound attribute also purchased a second contact associated with a second compound attribute. The second compound attribute includes the first attribute, associated with a third value which matches the first value, and the second attribute, associated with a fourth value, which lacks a match with the second value. The system outputs a recommendation for the current user to purchase contacts associated with the second compound attribute if the probability exceeds a threshold.