-
公开(公告)号:US12153588B2
公开(公告)日:2024-11-26
申请号:US18167724
申请日:2023-02-10
Applicant: ROKU, INC.
Inventor: Peter Martigny , Fedor Bartosh , Danish Shaikh , Vinh Nguyen , Manasi Deshmukh , Ratul Ray , Nitish Aggarwal , Srimaruti Manoj Nimmagadda , Kapil Kumar , Sameer Girolkar
IPC: G06F16/2457 , G06F16/242 , G06F16/9535
Abstract: A content retrieval system may receive a query associated with a plurality of content items in a repository. For each content item of the plurality of content items: a respective first and second similarity score may be generated based on a similarity between embeddings indicative of a first and second data type generated from the query and for the content item; and a respective normalized similarity score may be generated based on a combination of the respective first and second similarity scores. A set of content items with respective normalized similarity scores that satisfy a similarity score threshold may be identified. An exact-match (lexical) search may yield respective mapping scores for content items that may also be ranked. An output indicative of content items that are identified in the set of content items with high-ranking similarity scores and identified in the set of content items with high-ranking mapping scores.
-
公开(公告)号:US20250036638A1
公开(公告)日:2025-01-30
申请号:US18911887
申请日:2024-10-10
Applicant: ROKU, INC.
Inventor: Peter Martigny , Fedor Bartosh , Danish Shaikh , Vinh Nguyen , Manasi Deshmukh , Ratul Ray , Nitish Aggarwal , Srimaruti Manoj Nimmagadda , Kapil Kumar , Sameer Girolkar
IPC: G06F16/2457 , G06F16/242 , G06F16/9535
Abstract: A content retrieval system may receive a query associated with a plurality of content items in a repository. For each content item of the plurality of content items: a respective first and second similarity score may be generated based on a similarity between embeddings indicative of a first and second data type generated from the query and for the content item; and a respective normalized similarity score may be generated based on a combination of the respective first and second similarity scores. A set of content items with respective normalized similarity scores that satisfy a similarity score threshold may be identified. An exact-match (lexical) search may yield respective mapping scores for content items that may also be ranked. An output indicative of content items that are identified in the set of content items with high-ranking similarity scores and identified in the set of content items with high-ranking mapping scores.
-
公开(公告)号:US20250103894A1
公开(公告)日:2025-03-27
申请号:US18423825
申请日:2024-01-26
Applicant: Roku, Inc.
Inventor: Abhishek Majumdar , Yuxi Liu , Kapil Kumar , Nitish Aggarwal , Manasi Deshmukh , Danish Nasir Shaikh , Ravi Tiwari
IPC: G06N3/092 , G06F16/2457 , G06N3/0455
Abstract: Retrieving content items in response to a query in a way that increases user satisfaction and increases chances of users consuming a retrieved content item is not trivial. One retrieval strategy may include dividing the content items into buckets according to a dimension about the content items and retrieving a top K number of items from different buckets to balance semantic affinity and the dimension. Choosing an optimal K for different buckets for a given query can be a challenge. Reinforcement learning can be used to train and implement an agent model that can choose the optimal K for different buckets.
-
公开(公告)号:US20250045575A1
公开(公告)日:2025-02-06
申请号:US18423802
申请日:2024-01-26
Applicant: Roku, Inc.
Inventor: Abhishek Majumdar , Kapil Kumar , Nitish Aggarwal , Danish Nasir Shaikh , Manasi Deshmukh , Apoorva Jakalannanavar Halappa Manjula
IPC: G06N3/08
Abstract: Pre-trained large language models may be trained on a large data set which may not necessarily align with specific tasks, business goals, and requirements. Pre-trained large language models can solve generic semantic relationship or question-answering type problems but may not be suited for content item retrieval or recommendation of content items that are semantically relevant to a query. It is possible to build a machine learning model while using transfer learning to learn from pre-trained large language models. Training data can significantly impact the performance of machine learning models, especially machine learning models developed using transfer learning. The training data can impact a model's performance, generalization, fairness, and adaptation to specific domains. To address some of these concerns, a popularity bucketing strategy can be implemented to debias training data. Optionally, an ensemble of models can be used to generate diverse training data.
-
-
-