Detecting abusive language using character N-gram features

    公开(公告)号:US11010687B2

    公开(公告)日:2021-05-18

    申请号:US15224434

    申请日:2016-07-29

    申请人: Oath Inc.

    摘要: Methods and apparatus for detecting abusive language are disclosed. In one embodiment, a set of character N-grams is ascertained for a set of text. Feature values for a plurality of features of the set of text are determined, based, at least in part, on the set of character N-grams. A computer-generated model is applied to the feature values for the plurality of features to generate a score for the set of text, where the model includes a plurality of weights, each of the weights corresponding to one of the features. It may then be determined whether the set of text includes abusive language based, at least in part, on the score.

    Scalable and effective document summarization framework

    公开(公告)号:US10255356B2

    公开(公告)日:2019-04-09

    申请号:US16055575

    申请日:2018-08-06

    申请人: Oath Inc.

    IPC分类号: G06F17/30 G06F17/27

    摘要: Systems, methods, and apparatuses are disclosed for adaptively generating a summary of web-based content based on an attribute of a mobile communication device having transmitted a request for the web-based content. By adaptively generating the summary based on an attribute of the mobile communication device such as an amount of visual space available or a number of characters permitted in the interface, a display of the web-based content may be controlled on the mobile communication device in a way that was not previously available. This enables control of displaying web-based content that has been adaptively generated to be displayed on limited display screens based on a learned attribute of the mobile communication device requesting the web-based content.

    Collaborative personalization via simultaneous embedding of users and their preferences

    公开(公告)号:US11488028B2

    公开(公告)日:2022-11-01

    申请号:US15476685

    申请日:2017-03-31

    申请人: Oath Inc.

    摘要: A method is provided, including: processing interactions by a plurality of users with a plurality of content items, the content items being provided over a network in response to user requests received over the network, wherein each content item is associated with one or more entities; for each user, determining a user entity set that includes entities associated with content items with which the user interacted; embedding the users and the entities in a vector space, wherein the embedding is configured to place a given user, and the entities of the given user's user entity set, in proximity to each other in the vector space; for each user, performing a proximity search in the vector space to identify a set of nearest entities to the user in the vector space; for each user, generating a user profile using the identified set of nearest entities to the user.

    Baseline interest profile for recommendations using a geographic location

    公开(公告)号:US10834211B2

    公开(公告)日:2020-11-10

    申请号:US16558200

    申请日:2019-09-02

    申请人: Oath Inc.

    摘要: Software for a content-aggregation website generates a first representation of interests for a geographical location. The representation includes a plurality of entities that are derived from a corpus of documents. Each of the plurality of entities is associated with an expected value that is based on engagement signals from users in the geographical location and that is weighted using a sparse-polarity approach to be discriminative with respect to other entities. Each of the ingested articles is represented by the second representation that associates an aboutness score with each of the plurality of entities. The software uses the first representation, a similarity measure, and a second representation to create rankings of a plurality of ingested articles received. Then the software receives a request for access to the content-aggregation service from a new user from the geographical location and serves the new or infrequent user a content stream based on the rankings.

    User profile expansion for personalization and recommendation

    公开(公告)号:US10776433B2

    公开(公告)日:2020-09-15

    申请号:US15280976

    申请日:2016-09-29

    申请人: Oath Inc.

    摘要: Software for a website hosting a content-aggregation service generates a first representation of interests for a user. The first representation includes a plurality of entities including pivot entities and extended entities, where the extended entities result from a nearest-neighbor search of word embeddings. Each of the extended entities is associated with a nearness score that is weighted using a distance of the extended entity from one of the pivot entities. For each of a plurality of articles ingested by the content-aggregation service, the software generates a second representation that associates an aboutness score with each of the plurality of entities. The software uses the first representation, a similarity measure, and the second representations to create rankings of the plurality of articles. The software receives a request for access to the content-aggregation service from the user and serves the user a content stream based at least in part on the rankings.

    Scalable multilingual named-entity recognition

    公开(公告)号:US10699077B2

    公开(公告)日:2020-06-30

    申请号:US15406586

    申请日:2017-01-13

    申请人: Oath Inc.

    摘要: Software on a website serves a user of an online content aggregation service a first article that the user views. The software extracts named entities from the first article using a named-entity recognizer. The named-entity recognizer uses a sequence of word embeddings as inputs to a conditional random field (CRF) tool to assign labels to each of the word embeddings. Each of the word embeddings is associated with a word in the first article and is trained using an entire topical article from a corpus of topical articles as a context for the word. The software then creates rankings for articles ingested by the content aggregation service based at least in part on the named entities and serves the user a second article using the rankings.

    Multilabel learning via supervised joint embedding of documents and labels

    公开(公告)号:US10552501B2

    公开(公告)日:2020-02-04

    申请号:US15471455

    申请日:2017-03-28

    申请人: Oath Inc.

    摘要: A method implemented by at least one server computer is provided, including the following operations: receiving a plurality of training documents, each training document being defined by a sequence of words, each training document having one or more labels associated therewith; embedding the training documents, the words, and the labels in a vector space, wherein the embedding is configured to locate a given training document and its associated labels in proximity to each other in the vector space; embedding a new document in the vector space; performing a proximity search in the vector space to identify a set of nearest labels to the new document in the vector space; associating the nearest labels to the new document.

    Baseline Interest Profile for Recommendations Using a Geographic Location

    公开(公告)号:US20190387067A1

    公开(公告)日:2019-12-19

    申请号:US16558200

    申请日:2019-09-02

    申请人: Oath Inc.

    摘要: Software for a content-aggregation website generates a first representation of interests for a geographical location. The representation includes a plurality of entities that are derived from a corpus of documents. Each of the plurality of entities is associated with an expected value that is based on engagement signals from users in the geographical location and that is weighted using a sparse-polarity approach to be discriminative with respect to other entities. Each of the ingested articles is represented by the second representation that associates an aboutness score with each of the plurality of entities. The software uses the first representation, a similarity measure, and a second representation to create rankings of a plurality of ingested articles received. Then the software receives a request for access to the content-aggregation service from a new user from the geographical location and serves the new or infrequent user a content stream based on the rankings.

    SCALABLE AND EFFECTIVE DOCUMENT SUMMARIZATION FRAMEWORK

    公开(公告)号:US20180349490A1

    公开(公告)日:2018-12-06

    申请号:US16055575

    申请日:2018-08-06

    申请人: Oath Inc.

    IPC分类号: G06F17/30 G06F17/27

    摘要: Systems, methods, and apparatuses are disclosed for adaptively generating a summary of web-based content based on an attribute of a mobile communication device having transmitted a request for the web-based content. By adaptively generating the summary based on an attribute of the mobile communication device such as an amount of visual space available or a number of characters permitted in the interface, a display of the web-based content may be controlled on the mobile communication device in a way that was not previously available. This enables control of displaying web-based content that has been adaptively generated to be displayed on limited display screens based on a learned attribute of the mobile communication device requesting the web-based content.

    Scalable and effective document summarization framework

    公开(公告)号:US10810242B2

    公开(公告)日:2020-10-20

    申请号:US16377525

    申请日:2019-04-08

    申请人: Oath Inc.

    摘要: Systems, methods, and apparatuses are disclosed for adaptively generating a summary of web-based content based on an attribute of a mobile communication device having transmitted a request for the web-based content. By adaptively generating the summary based on an attribute of the mobile communication device such as an amount of visual space available or a number of characters permitted in the interface, a display of the web-based content may be controlled on the mobile communication device in a way that was not previously available. This enables control of displaying web-based content that has been adaptively generated to be displayed on limited display screens based on a learned attribute of the mobile communication device requesting the web-based content.