SCALABLE AND EFFECTIVE DOCUMENT SUMMARIZATION FRAMEWORK

    公开(公告)号:US20170228457A1

    公开(公告)日:2017-08-10

    申请号:US15019646

    申请日:2016-02-09

    Applicant: Yahoo! Inc.

    CPC classification number: G06F17/30719 G06F17/271 G06F17/277 G06F17/279

    Abstract: Systems, methods, and apparatuses are disclosed for adaptively generating a summary of web-based content based on an attribute of a mobile communication device having transmitted a request for the web-based content. By adaptively generating the summary based on an attribute of the mobile communication device such as an amount of visual space available or a number of characters permitted in the interface, a display of the web-based content may be controlled on the mobile communication device in a way that was not previously available. This enables control of displaying web-based content that has been adaptively generated to be displayed on limited display screens based on a learned attribute of the mobile communication device requesting the web-based content.

    Entity disambiguation
    2.
    发明授权

    公开(公告)号:US11907858B2

    公开(公告)日:2024-02-20

    申请号:US15425978

    申请日:2017-02-06

    Applicant: Yahoo!, Inc.

    CPC classification number: G06N5/04 G06F16/36

    Abstract: One or more computing devices, systems, and/or methods for entity disambiguation are provided. For example, a document may be analyzed to identify a first mention and a second mention. One or more techniques may be used to select and link a candidate entity, from a first set of candidate entities, to the first mention and select and link a candidate entity, from a second set of candidate entities, to the second mention.

    Systems and Methods for Multiobjective Optimization

    公开(公告)号:US20180285473A1

    公开(公告)日:2018-10-04

    申请号:US15472019

    申请日:2017-03-28

    Applicant: Yahoo! Inc.

    Abstract: Methods and systems for ranking a plurality of articles for rendering on a website for a user account include receiving a request for accessing the website. Features are identified for the plurality of articles selected for rendering on the website. Each feature is associated with a value parameter having a value in a multi-dimensional vector space. A pair of solutions is identified for an article of the plurality of articles, wherein the pair of solutions identifies a portion of the multi-dimensional vector space that satisfies multiple objectives. A vector point defining the optimal solution is selected for the article from within the portion of the multi-dimensional vector space. The selected vector point is used in computing an article score for the article. The article score for the plurality of articles is used to identify and present a subset of the articles on the website for the user.

    ENTITY DISAMBIGUATION
    4.
    发明申请

    公开(公告)号:US20180225576A1

    公开(公告)日:2018-08-09

    申请号:US15425978

    申请日:2017-02-06

    Applicant: Yahoo!, Inc.

    CPC classification number: G06N5/04 G06F16/36

    Abstract: One or more computing devices, systems, and/or methods for entity disambiguation are provided. For example, a document may be analyzed to identify a first mention and a second mention. One or more techniques may be used to select and link a candidate entity, from a first set of candidate entities, to the first mention and select and link a candidate entity, from a second set of candidate entities, to the second mention.

    User Profile Expansion For Personalization and Recommendation

    公开(公告)号:US20180089311A1

    公开(公告)日:2018-03-29

    申请号:US15280976

    申请日:2016-09-29

    Applicant: Yahoo! Inc.

    Abstract: Software for a website hosting a content-aggregation service generates a first representation of interests for a user. The first representation includes a plurality of entities including pivot entities and extended entities, where the extended entities result from a nearest-neighbor search of word embeddings. Each of the extended entities is associated with a nearness score that is weighted using a distance of the extended entity from one of the pivot entities. For each of a plurality of articles ingested by the content-aggregation service, the software generates a second representation that associates an aboutness score with each of the plurality of entities. The software uses the first representation, a similarity measure, and the second representations to create rankings of the plurality of articles. The software receives a request for access to the content-aggregation service from the user and serves the user a content stream based at least in part on the rankings.

    DETECTING ABUSIVE LANGUAGE USING CHARACTER N-GRAM FEATURES

    公开(公告)号:US20180032907A1

    公开(公告)日:2018-02-01

    申请号:US15224434

    申请日:2016-07-29

    Applicant: Yahoo! Inc.

    CPC classification number: G06N20/00 G06F16/24578 G06F17/271

    Abstract: Methods and apparatus for detecting abusive language are disclosed. In one embodiment, a set of character N-grams is ascertained for a set of text. Feature values for a plurality of features of the set of text are determined, based, at least in part, on the set of character N-grams. A computer-generated model is applied to the feature values for the plurality of features to generate a score for the set of text, where the model includes a plurality of weights, each of the weights corresponding to one of the features. It may then be determined whether the set of text includes abusive language based, at least in part, on the score.

    COLLABORATIVE PERSONALIZATION VIA SIMULTANEOUS EMBEDDING OF USERS AND THEIR PREFERENCES

    公开(公告)号:US20180285774A1

    公开(公告)日:2018-10-04

    申请号:US15476685

    申请日:2017-03-31

    Applicant: Yahoo! Inc.

    Abstract: A method is provided, including: processing interactions by a plurality of users with a plurality of content items, the content items being provided over a network in response to user requests received over the network, wherein each content item is associated with one or more entities; for each user, determining a user entity set that includes entities associated with content items with which the user interacted; embedding the users and the entities in a vector space, wherein the embedding is configured to place a given user, and the entities of the given user's user entity set, in proximity to each other in the vector space; for each user, performing a proximity search in the vector space to identify a set of nearest entities to the user in the vector space; for each user, generating a user profile using the identified set of nearest entities to the user.

    MULTILABEL LEARNING VIA SUPERVISED JOINT EMBEDDING OF DOCUMENTS AND LABELS

    公开(公告)号:US20180285459A1

    公开(公告)日:2018-10-04

    申请号:US15471455

    申请日:2017-03-28

    Applicant: Yahoo! Inc.

    Abstract: A method implemented by at least one server computer is provided, including the following operations: receiving a plurality of training documents, each training document being defined by a sequence of words, each training document having one or more labels associated therewith; embedding the training documents, the words, and the labels in a vector space, wherein the embedding is configured to locate a given training document and its associated labels in proximity to each other in the vector space; embedding a new document in the vector space; performing a proximity search in the vector space to identify a set of nearest labels to the new document in the vector space; associating the nearest labels to the new document.

    Scalable Multilingual Named-Entity Recognition

    公开(公告)号:US20180203843A1

    公开(公告)日:2018-07-19

    申请号:US15406586

    申请日:2017-01-13

    Applicant: Yahoo! Inc.

    Abstract: Software on a website serves a user of an online content aggregation service a first article that the user views. The software extracts named entities from the first article using a named-entity recognizer. The named-entity recognizer uses a sequence of word embeddings as inputs to a conditional random field (CRF) tool to assign labels to each of the word embeddings. Each of the word embeddings is associated with a word in the first article and is trained using an entire topical article from a corpus of topical articles as a context for the word. The software then creates rankings for articles ingested by the content aggregation service based at least in part on the named entities and serves the user a second article using the rankings.

    Baseline Interest Profile for Recommendations Using a Geographic Location

    公开(公告)号:US20180077249A1

    公开(公告)日:2018-03-15

    申请号:US15265777

    申请日:2016-09-14

    Applicant: Yahoo! Inc.

    CPC classification number: H04L67/18 G06F16/9535 G06N20/00 H04L67/10

    Abstract: Software for a content-aggregation website generates a first representation of interests for a geographical location. The representation includes a plurality of entities that are derived from a corpus of documents. Each of the plurality of entities is associated with an expected value that is based on engagement signals from users in the geographical location and that is weighted using a sparse-polarity approach to be discriminative with respect to other entities. Each of the ingested articles is represented by the second representation that associates an aboutness score with each of the plurality of entities. The software uses the first representation, a similarity measure, and a second representation to create rankings of a plurality of ingested articles received. Then the software receives a request for access to the content-aggregation service from a new user from the geographical location and serves the new or infrequent user a content stream based on the rankings.

Patent Agency Ranking