DETERMINING TARGET POLICY PERFORMANCE VIA OFF-POLICY EVALUATION IN EMBEDDING SPACES

    公开(公告)号:US20230394332A1

    公开(公告)日:2023-12-07

    申请号:US17804991

    申请日:2022-06-01

    申请人: Adobe Inc.

    IPC分类号: G06N5/04 G06F11/34 G06F11/30

    摘要: The present disclosure describes methods, systems, and non-transitory computer-readable media for generating a projected value metric that projects a performance of a target policy within a digital action space. For instance, in one or more embodiments, the disclosed systems identify a target policy for performing digital actions represented within a digital action space. The disclosed systems further determine a set of sampled digital actions performed according to a logging policy and represented within the digital action space. Utilizing an embedding model, the disclosed systems generate a set of action embedding vectors representing the set of sampled digital actions within an embedding space. Further, utilizing the set of action embedding vectors, the disclosed systems generate a projected value metric indicating a projected performance of the target policy.