Systems and Methods for Multimodal Multilabel Tagging of Video

    公开(公告)号:US20200084519A1

    公开(公告)日:2020-03-12

    申请号:US16124840

    申请日:2018-09-07

    申请人: Oath Inc.

    摘要: Multimodal multilabel tagging of video content may include labeling the video content with topical tags that are identified based on extracted features from two or more modalities of the video content. The two or more modalities may include (i) a video modality for the object, images, and/or visual elements of the video content, (ii) a text modality for the speech, dialog, and/or text of the video content, and/or (iii) an audio modality for non-speech sounds and/or sound characteristics of the video content. Combinational multimodal multilabel tagging may include combining two or more features from the same or different modality in order to increase the contextual understanding of the features and generate contextually relevant tags. Video content may be labeled with global tags relating to overall topics of the video content, and different sets of local tags relating to topics at different segments of the video content.

    SYSTEM AND METHOD FOR LEARNING SCENE EMBEDDINGS VIA VISUAL SEMANTICS AND APPLICATION THEREOF

    公开(公告)号:US20200097764A1

    公开(公告)日:2020-03-26

    申请号:US16142155

    申请日:2018-09-26

    申请人: OATH INC.

    摘要: The present teaching relates to method, system, and programming for responding to an image related query. Information related to each of a plurality of images is received, wherein the information represents concepts co-existing in the image. Visual semantics for each of the plurality of images are created based on the information related thereto. Representations of scenes of the plurality of images are obtained via machine learning, based on the visual semantics of the plurality of images, wherein the representations capture concepts associated with the scenes.

    Latent user communities
    5.
    发明授权

    公开(公告)号:US11444909B2

    公开(公告)日:2022-09-13

    申请号:US16699962

    申请日:2019-12-02

    申请人: Oath Inc.

    摘要: A method implemented by at least one server computer is provided, including: providing, over the Internet, access to a plurality of topics, wherein each topic includes, and further provides access to, a plurality of posted items; recording interaction data for the plurality of topics, the interaction data identifying user activity occurring within each of the topics; analyzing the interaction data to identify clusters of topics that exhibit similar behavioral patterns; for each cluster of topics, generating a community that includes the topics in the cluster; providing, over the Internet, access to the communities, wherein accessing a given community further provides access to the topics included in that community, which further provide access to the posted items that are included in the topics within that community.

    Scalable multilingual named-entity recognition

    公开(公告)号:US10699077B2

    公开(公告)日:2020-06-30

    申请号:US15406586

    申请日:2017-01-13

    申请人: Oath Inc.

    摘要: Software on a website serves a user of an online content aggregation service a first article that the user views. The software extracts named entities from the first article using a named-entity recognizer. The named-entity recognizer uses a sequence of word embeddings as inputs to a conditional random field (CRF) tool to assign labels to each of the word embeddings. Each of the word embeddings is associated with a word in the first article and is trained using an entire topical article from a corpus of topical articles as a context for the word. The software then creates rankings for articles ingested by the content aggregation service based at least in part on the named entities and serves the user a second article using the rankings.

    Multilabel learning via supervised joint embedding of documents and labels

    公开(公告)号:US10552501B2

    公开(公告)日:2020-02-04

    申请号:US15471455

    申请日:2017-03-28

    申请人: Oath Inc.

    摘要: A method implemented by at least one server computer is provided, including the following operations: receiving a plurality of training documents, each training document being defined by a sequence of words, each training document having one or more labels associated therewith; embedding the training documents, the words, and the labels in a vector space, wherein the embedding is configured to locate a given training document and its associated labels in proximity to each other in the vector space; embedding a new document in the vector space; performing a proximity search in the vector space to identify a set of nearest labels to the new document in the vector space; associating the nearest labels to the new document.

    Computerized system and method for formatted transcription of multimedia content

    公开(公告)号:US10332506B2

    公开(公告)日:2019-06-25

    申请号:US14843185

    申请日:2015-09-02

    申请人: OATH INC.

    摘要: Disclosed are systems and methods for improving interactions with and between computers in content searching, generating, hosting and/or providing systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the quality of data used in processing interactions between or among processors in such systems. The disclosed systems and methods provide systems and methods for automatic creation of a formatted, readable transcript of multimedia content, which is derived, extracted, determined, or otherwise identified from the multimedia content. The formatted, readable transcript can be utilized to increase accuracy and efficiency in search engine optimization, as well as identification of relevant digital content available for communication to a user.

    Systems and methods for multimodal multilabel tagging of video

    公开(公告)号:US10965999B2

    公开(公告)日:2021-03-30

    申请号:US16806544

    申请日:2020-03-02

    申请人: Oath Inc.

    摘要: Multimodal multilabel tagging of video content may include labeling the video content with topical tags that are identified based on extracted features from two or more modalities of the video content. The two or more modalities may include (i) a video modality for the object, images, and/or visual elements of the video content, (ii) a text modality for the speech, dialog, and/or text of the video content, and/or (iii) an audio modality for non-speech sounds and/or sound characteristics of the video content. Combinational multimodal multilabel tagging may include combining two or more features from the same or different modality in order to increase the contextual understanding of the features and generate contextually relevant tags. Video content may be labeled with global tags relating to overall topics of the video content, and different sets of local tags relating to topics at different segments of the video content.