Speaker Verification with Multitask Speech Models
Abstract:
A method includes obtaining a speaker identification (SID) model trained to predict speaker embeddings from utterances spoken by different speakers, the SID model includes a trained audio encoder and a trained SID head. The method also includes receiving a plurality of synthetic speech detection (SSD) training utterances that include a set of human-originated speech samples and a set of synthetic speech samples. The method also includes training, using the trained audio encoder, a SSD head on the SSD training utterances to learn to detect the presence of synthetic speech in audio encodings encoded by the trained audio encoder. The operations also include providing, for execution on a computing device, a multitask neural network model for performing both SID tasks and SSD tasks on input audio data in parallel.
Information query
Patent Agency Ranking
0/0