Recommending sequences of content with bootstrapped reinforcement learning
Abstract:
Systems and methods provide a recommendation system for recommending sequential content. The training of a reinforcement learning (RL) agent is bootstrapped from passive data. The RL agent of the sequential recommendations system is trained using the passive data over a number of epochs involving interactions between the sequential recommendation system and user devices. At each epoch, available active data from previous epochs is obtained, and transition probabilities are generated from the passive data and at least one parameter derived from the currently available active data. Recommended content is selected based on a current state and the generated transition probabilities, and the active data is updated from the current epoch based on the recommended content and a resulting new state. A clustering approach can also be employed when deriving parameters from active data to balance model expressiveness and data sparsity.
Information query
Patent Agency Ranking
0/0