Abstract:
The present teaching relates to recommending content by analyzing the streamed data. A request is received from a user requesting one or more recommendations from a set of items. A first distribution indicative of an interest distribution of the user in a plurality of topics is obtained. For each item, a second distribution indicative of a classification distribution of the item with respect to the plurality of topics is obtained. A score is estimated based on the first distribution and the second distribution, wherein the score indicates likelihood that the user is interested in the item. The scores associated with the set of items are ranked. The one or more recommendations are presented based on the ranked scores
Abstract:
A location prediction framework is described for applying location labels or tags to target documents and/or identifying location-sensitive queries. Terms in content and queries are represented by corresponding term locations vectors (TLVs) in which the term is represented as a weighted distribution across locations. Each element of a TLV represents a probability that the term corresponding to the TLV relates to a particular location. Predicted locations may be introduced as features to a ranking framework to improve the identification and ranking of search results for a given query.
Abstract:
Disclosed are systems and methods for improving interactions with and between computers in content searching, generating, hosting and/or providing systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the quality of data used in processing interactions between or among processors in such systems. The disclosed systems and methods provide systems and methods for determining and suggesting query auto-completions (QACs). In some embodiments, when a user is inputting a search query, the disclosed systems and methods can provide a QAC suggestion based on the inputted text in addition to application programs installed and/or executing on the user's device.
Abstract:
The present teaching relates to ranking search content. In one example, a plurality of documents is received to be ranked with respect to a query. Features are extracted from the query and the plurality of documents. The plurality of documents is ranked based on a ranking model and the extracted features. The ranking model is derived to remove one or more documents from the plurality of documents that are less relevant to the query and order remaining documents based on their relevance to the query. The ordered remaining documents are provided as a search result with respect to the query.
Abstract:
Techniques are described herein for enhancing the ranking products using purchase day based time windows. A purchase day based time window is a time window that is defined to include purchase days selected from a series of consecutive days. A purchase day is a day on which a product associated with the time window is purchased. The series of consecutive days includes the purchase days intermixed with non-purchase day(s). A non-purchase day is a day on which the product associated with the time window is not purchased. The purchase day based time window is further defined to not include the non-purchase day(s).
Abstract:
In one embodiment, a set of training data consisting of inliers may be obtained. A supervised classification model may be trained using the set of training data to identify outliers. The supervised classification model may be applied to generate an anomaly score for a data point. It may be determined whether the data point is an outlier based, at least in part, upon the anomaly score.
Abstract:
A system and method is described for large-scale, automated classification of products. The system and method receives information about products, wherein such information includes one or more text metadata fields associated with each product, receives a set of categories, and automatically selects one or more categories from the set of categories to which each product belongs based upon at least one of the one or more text metadata fields associated with each product. A machine learning classifier may be used to automatically select the one or more categories to which each product belongs by operating upon a feature vector for each product derived from text metadata fields of the product description. The machine learning classifier may be trained using a set of pre-categorized product descriptions. The product-category associations generated by the system and method can be used to improve search engine results or product recommendations to consumers.
Abstract:
Disclosed are systems and methods for improving interactions with and between computers in content searching, generating, hosting and/or providing systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the quality of data used in processing interactions between or among processors in such systems. The disclosed systems and methods provide a unified digital content discovery framework that implements a combination of a logistic loss function and a pair-wise loss function for information retrieval. The logistic loss function reduces non-relevant images from appearing in the retrieved results, while the pair-wise loss function ensures that the highest-quality content is included in such results. The combination of such functions provides a search information retrieval system with the novel functionality of quantifying a search results' relevance and quality in accordance with the searcher's intent.
Abstract:
One particular embodiment clusters a plurality of documents using one or more clustering algorithms to obtain one or more first sets of clusters, wherein: each first set of clusters results from clustering the documents using one of the clustering algorithms; and with respect to each first set of clusters, each of the documents belongs to one of the clusters from the first set of clusters; accesses a search query; identifies a search result in response to the search query, wherein the search result comprises two or more of the documents; and clusters the search result to obtain a second set of clusters, wherein each document of the search result belongs to one of the clusters from the second set of clusters.