System and method for accent-agnostic frame-level wake word detection

    公开(公告)号:US12272357B2

    公开(公告)日:2025-04-08

    申请号:US17929280

    申请日:2022-09-01

    Abstract: A method includes accessing, using at least one processor of an electronic device, a machine learning model. The machine learning model is a trained student model that is trained using audio samples in a plurality of accent types. The method also includes receiving, using the at least one processor, an audio input from an audio input device. The method further includes providing, using the at least one processor, the audio input to the trained student model. The method also includes receiving, using the at least one processor, an output from the trained student model including frame-level probabilities associated with the audio input. In addition, the method includes instructing, using the at least one processor, at least one action based on the frame-level probabilities associated with the audio input.

    SYSTEM AND METHOD FOR SPEAKER VERIFICATION FOR VOICE ASSISTANT

    公开(公告)号:US20230419962A1

    公开(公告)日:2023-12-28

    申请号:US18047609

    申请日:2022-10-18

    CPC classification number: G10L15/22 G10L2015/088 G10L15/08

    Abstract: A method includes obtaining audio data and identifying an utterance of a wake word or phrase in the audio data. The method also includes generating an embedding vector based on the utterance from the audio data and accessing a set of previously-generated vectors representing previous utterances of the wake word or phrase. The method further includes performing clustering on the embedding vector and the set of previously-generated vectors to identify a cluster including the embedding vector, where the identified cluster is associated with a speaker. The method also includes updating a speaker vector associated with the speaker based on the embedding vector and determining, using a speaker verification model, a similarity score between the updated speaker vector and the embedding vector. In addition, the method includes determining, based on the similarity score, whether a speaker providing the utterance matches the speaker associated with the identified cluster.

Patent Agency Ranking