Processing Multimodal User Input for Assistant Systems

    公开(公告)号:US20230222605A1

    公开(公告)日:2023-07-13

    申请号:US18185258

    申请日:2023-03-16

    Abstract: In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-reference to one or more of the subjects, resolving entities corresponding to the subjects associated with the co-reference based on the attributes and the co-reference, and presenting a communication content responsive to the speech input and the visual input at the head-mounted device, wherein the communication content comprises information associated with executing results of tasks corresponding to the resolved entities.

Patent Agency Ranking