Abstract:
Identifying entities in email signature blocks is described. A system scores each token, in a sequence of tokens from an email signature block, based on entity types, wherein each token is a word, a punctuation symbol, or an end-of-line character. The system identifies each entity sequence which includes a number of entities that matches the number of tokens in the sequence of tokens. The system identifies an entity sequence with a highest score based on applying scores for each token in the sequence of tokens to each identified entity sequence. The system outputs the sequence of tokens as an identified set of entities based on the entity sequence with the highest score.
Abstract:
Contact recommendations based on purchase history are described. A system creates a directed graph of nodes in which at least some of the nodes are connected by directed arcs, wherein a directed arc from a first node to a second node represents a conditional probability that previous users who purchased a first contact also purchased a second contact. The system identifies a set of contacts purchased by a current user. The system estimates a prospective purchase probability based on a historical probability that previous users purchased a specific contact and a related probability that previous users who purchased the specific contact also purchased a contact in the set of contacts, for each candidate contact. The system outputs a recommendation for the current user to purchase a recommended candidate contact based on a corresponding prospective purchase probability.
Abstract:
The technology disclosed relates to methods for partitioning sets of features for a Bayesian classifier, finding a data partition that makes the classification process faster and more accurate, while discovering and taking into account feature dependence among sets of features in the data set. It relates to computing class entropy scores for a class label across all tuples that share the feature-subset and arranging the tuples in order of non-decreasing entropy scores for the class label, and constructing a data partition that offers the highest improvement in predictive accuracy for the data set. Also disclosed is a method for partitioning a complete set of records of features in a batch computation, computing increasing predictive power; and also relates to starting with singleton partitions, and using an iterative process to construct a data partition that offers the highest improvement in predictive accuracy for the data set.
Abstract:
The technology disclosed relates to methods for partitioning sets of features for a Bayesian classifier, finding a data partition that makes the classification process faster and more accurate, while discovering and taking into account feature dependence among sets of features in the data set. It relates to computing class entropy scores for a class label across all tuples that share the feature-subset and arranging the tuples in order of non-decreasing entropy scores for the class label, and constructing a data partition that offers the highest improvement in predictive accuracy for the data set. Also disclosed is a method for partitioning a complete set of records of features in a batch computation, computing increasing predictive power; and also relates to starting with singleton partitions, and using an iterative process to construct a data partition that offers the highest improvement in predictive accuracy for the data set.
Abstract:
Matching objects using keys based on match rules is described. A system generates a match rule key based on a match rule, wherein the match rule specifies whether two objects match. The system creates candidate keys by applying the match rule key to data objects. The system creates a probe key by applying the match rule key to a probe object. The system determines whether the probe key matches a candidate key. The system determines whether the probe object matches a candidate object based on applying the match rule to the probe object and the candidate object if the probe key matches the candidate key corresponding to the candidate object. The system identifies the probe object and the candidate object as matching based on the match rule if the probe object matches the candidate object.
Abstract:
User trust scores based on registration features is described. A system identifies registration features associated with a user registered to interact with a database. The system calculates a registration trust score for the user based on a comparison of multiple registration features associated with the user to corresponding registration features associated with previous users who are restricted from interacting with the database and/or corresponding registration features associated with previous users who are enabled to interact with the database. The system restricts the user from interacting with the database if the registration trust score is above a registration threshold.
Abstract:
The technology disclosed relates to methods for partitioning sets of features for a Bayesian classifier, finding a data partition that makes the classification process faster and more accurate, while discovering and taking into account feature dependence among sets of features in the data set. It relates to computing class entropy scores for a class label across all tuples that share the feature-subset and arranging the tuples in order of non-decreasing entropy scores for the class label, and constructing a data partition that offers the highest improvement in predictive accuracy for the data set. Also disclosed is a method for partitioning a complete set of records of features in a batch computation, computing increasing predictive power; and also relates to starting with singleton partitions, and using an iterative process to construct a data partition that offers the highest improvement in predictive accuracy for the data set.
Abstract:
A system and method for building a profile record for a person from business contacts stored in a database. Contacts having similar name signatures are collected together, then pairs of such contacts are compared using defined criteria.
Abstract:
A system creates a graph of nodes connected by arcs, and identifies a first compound attribute associated with contacts purchased by a current user. The first compound attribute includes a first attribute associated with a first value and a second attribute associated with a second value. The system identifies a directed arc from a first node to a second node. The directed arc is associated with a probability that previous users who purchased a first contact associated with the first compound attribute also purchased a second contact associated with a second compound attribute. The second compound attribute includes the first attribute, associated with a third value which matches the first value, and the second attribute, associated with a fourth value, which lacks a match with the second value. The system outputs a recommendation for the current user to purchase contacts associated with the second compound attribute if the probability exceeds a threshold.
Abstract:
Identifying entities in email signature blocks is described. A system scores each token, in a sequence of tokens from an email signature block, based on entity types, wherein each token is a word, a punctuation symbol, or an end-of-line character. The system identifies each entity sequence which includes a number of entities that matches the number of tokens in the sequence of tokens. The system identifies an entity sequence with a highest score based on applying scores for each token in the sequence of tokens to each identified entity sequence. The system outputs the sequence of tokens as an identified set of entities based on the entity sequence with the highest score.