REINFORCEMENT LEARNING (RL) MODEL FOR OPTIMIZING LONG TERM REVENUE

    公开(公告)号:US20240273575A1

    公开(公告)日:2024-08-15

    申请号:US18108090

    申请日:2023-02-10

    Applicant: ROKU, INC.

    CPC classification number: G06Q30/0269 G06Q30/0261

    Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for optimizing user experience/engagement and revenue. An example embodiment operates by a computer-implemented method for providing one or more advertisements to a media device. The method includes receiving, by at least one computer processor, a user state associated with a user of the media device, where the user state corresponds to a time step. The method further includes receiving a revenue value associated with the user of the media device, where the revenue value corresponds to the time step. The method also include determining an action associated with the user based on the user state and the revenue value. The action includes one or more parameters associated with the one or more advertisements. The method further includes providing the action to the user.

    Interest-based conversational recommendation system

    公开(公告)号:US12190864B1

    公开(公告)日:2025-01-07

    申请号:US18734961

    申请日:2024-06-05

    Applicant: Roku, Inc.

    Abstract: Disclosed herein are system, method and/or computer program product embodiments, and/or combinations thereof, for training a conversational recommendation system. An embodiment generates a probabilistic pseudo-user neural network model based on at least one interest probability distribution corresponding to a pseudo-user profile. The embodiment trains, using the pseudo-user neural network model, the conversational recommendation system to learn a recommendation policy, where the conversational recommendation system includes an interest-exploration engine and a prompt-decision engine. The training includes performing an iterative learning process that includes selecting an interest-exploration strategy based on one or more of the following: an interest-exploration policy, an earlier pseudo-user response generated by the pseudo-user neural network model, content data, and pseudo-user interaction history. The embodiment then generates, using the trained conversational recommendation system, a real-time recommendation having high play probability based on the minimal number of iterations of conversation between a user and the trained conversational recommendation system.

Patent Agency Ranking