USING A LARGE LANGUAGE MODEL TO IMPROVE TRAINING DATA

    公开(公告)号:US20250045535A1

    公开(公告)日:2025-02-06

    申请号:US18423789

    申请日:2024-01-26

    Applicant: Roku, Inc.

    Abstract: Training data can significantly impact the performance of machine learning models. Its impact may be more significant in transfer learning. Different data sources can be used to generate training data used in transfer learning. The training data originating from user interaction logs may be subject to presentation bias. The training data originating from model generated labeled data may have false positives. Poor quality training data may cause the machine learning model to perform poorly. To address some of these concerns, a checker having one or more models can check for false positives and for labeled data entries that may have been subject to presentation bias. Such entries may be removed or modified. In some cases, the checker can generate a test that can be used to test the machine learning model and penalize the machine learning model if the model generates an incorrect prediction.

    SEARCH SYSTEMS BASED ON USER RELEVANCE AND REVENUE GENERATION

    公开(公告)号:US20240430538A1

    公开(公告)日:2024-12-26

    申请号:US18744191

    申请日:2024-06-14

    Applicant: Roku, Inc.

    Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for determining a list of recommended items in response to a user query. An embodiment can generate an ordered relevance list of items, and determine an initial reward value based on an array of relevance scores and an array of revenue values corresponding to the ordered relevance list of items, a parameter alpha assigned to the array of relevance scores, and a parameter beta assigned to the array of revenue values. The embodiment can generate a next list of recommended items from an initial list of recommended items, and further calculate a next reward value associated with the next list of recommended items, and determine a list of recommended items in response to the query based on a comparison of the initial reward value and the next reward value.

    PERSONALIZED RETRIEVAL SYSTEM
    5.
    发明公开

    公开(公告)号:US20240346084A1

    公开(公告)日:2024-10-17

    申请号:US18398495

    申请日:2023-12-28

    Applicant: Roku, Inc.

    CPC classification number: G06F16/9035 G06F16/9038 G06F40/40

    Abstract: Disclosed are system, method and/or computer program product embodiments that retrieve items for a user based on a query using a two-tower deep machine learning model. An example embodiment provides input to a context tower, wherein the input includes the query and one or more of a query embedding corresponding to the query or a graph user embedding corresponding to the user. The context tower generates a context embedding in a vector space based on the input. The model determines a measure of similarity between the context embedding and each of a plurality of item embeddings in the vector space that are generated by an item tower and represent a plurality of candidate items. A relevancy score is calculated for each candidate item based on the measure of similarity between the context embedding and the corresponding item embedding. The relevancy scores are used for item retrieval and/or ranking.

    HETEROGENEOUS GRAPH NEURAL NETWORK USING OFFSET TEMPORAL LEARNING FOR SEARCH PERSONALIZATION

    公开(公告)号:US20240346309A1

    公开(公告)日:2024-10-17

    申请号:US18582249

    申请日:2024-02-20

    Applicant: Roku, Inc.

    CPC classification number: G06N3/08 G06N3/042

    Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for training a heterogenous graph neural network (GNN) to generate user embeddings corresponding to users and item embeddings corresponding to items. An example embodiment generates a first user interaction graph for a first time window and a second user interaction graph for a second time window, wherein each graph represents users and items as nodes and user-item interactions within the respective time window as edges, samples user-item node pairs from the second user interaction graph, and trains the heterogeneous GNN based on user-item node pairs from the first user interaction graph that correspond to the sampled user-item node pairs from the second user interaction graph. User and item embeddings generated by the trained GNN may be used to determine a relevancy of a given item with respect to a given user.

Patent Agency Ranking