PASSIVE AND CONTINUOUS MULTI-SPEAKER VOICE BIOMETRICS

    公开(公告)号:US20210326421A1

    公开(公告)日:2021-10-21

    申请号:US17231672

    申请日:2021-04-15

    Abstract: Embodiments described herein provide for a voice biometrics system execute machine-learning architectures capable of passive, active, continuous, or static operations, or a combination thereof. Systems passively and/or continuously, in some cases in addition to actively and/or statically, enrolling speakers as the speakers speak into or around an edge device (e.g., car, television, radio, phone). The system identifies users on the fly without requiring a new speaker to mirror prompted utterances for reconfiguring operations. The system manages speaker profiles as speakers provide utterances to the system. Machine-learning architectures implement a passive and continuous voice biometrics system, possibly without knowledge of speaker identities. The system creates identities in an unsupervised manner, sometimes passively enrolling and recognizing known or unknown speakers. The system offers personalization and security across a wide range of applications, including media content for over-the-top services and IoT devices (e.g., personal assistants, vehicles), and call centers.

    ROBUST SPOOFING DETECTION SYSTEM USING DEEP RESIDUAL NEURAL NETWORKS

    公开(公告)号:US20210233541A1

    公开(公告)日:2021-07-29

    申请号:US17155851

    申请日:2021-01-22

    Abstract: Embodiments described herein provide for systems and methods for implementing a neural network architecture for spoof detection in audio signals. The neural network architecture contains a layers defining embedding extractors that extract embeddings from input audio signals. Spoofprint embeddings are generated for particular system enrollees to detect attempts to spoof the enrollee's voice. Optionally, voiceprint embeddings are generated for the system enrollees to recognize the enrollee's voice. The voiceprints are extracted using features related to the enrollee's voice. The spoofprints are extracted using features related to features of how the enrollee speaks and other artifacts. The spoofprints facilitate detection of efforts to fool voice biometrics using synthesized speech (e.g., deepfakes) that spoof and emulate the enrollee's voice.

    SYSTEM AND METHOD FOR CLUSTER-BASED AUDIO EVENT DETECTION

    公开(公告)号:US20190096424A1

    公开(公告)日:2019-03-28

    申请号:US16200283

    申请日:2018-11-26

    Abstract: Methods, systems, and apparatuses for audio event detection, where the determination of a type of sound data is made at the cluster level rather than at the frame level. The techniques provided are thus more robust to the local behavior of features of an audio signal or audio recording. The audio event detection is performed by using Gaussian mixture models (GMMs) to classify each cluster or by extracting an i-vector from each cluster. Each cluster may be classified based on an i-vector classification using a support vector machine or probabilistic linear discriminant analysis. The audio event detection significantly reduces potential smoothing error and avoids any dependency on accurate window-size tuning. Segmentation may be performed using a generalized likelihood ratio and a Bayesian information criterion, and the segments may be clustered using hierarchical agglomerative clustering. Audio frames may be clustered using K-means and GMMs.

    CENTRALIZED SYNTHETIC SPEECH DETECTION SYSTEM USING WATERMARKING

    公开(公告)号:US20250029614A1

    公开(公告)日:2025-01-23

    申请号:US18777278

    申请日:2024-07-18

    Abstract: Disclosed are systems and methods including software processes executed by a server for obtaining, by a computer, an audio signal including synthetic speech, extracting, by the computer, metadata from a watermark of the audio signal by applying a set of keys associated with a plurality of text-to-speech (TTS) services to the audio signal, the metadata indicating an origin of the synthetic speech in the audio signal, and generating, by the computer, based on the extracted metadata, a notification indicating that the audio signal includes the synthetic speech.

    CHANNEL-COMPENSATED LOW-LEVEL FEATURES FOR SPEAKER RECOGNITION

    公开(公告)号:US20230290357A1

    公开(公告)日:2023-09-14

    申请号:US18321353

    申请日:2023-05-22

    CPC classification number: G10L17/20 G10L17/02 G10L17/04 G10L17/18 G10L19/028

    Abstract: A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers, and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.

    CROSS-LINGUAL SPEAKER RECOGNITION
    30.
    发明申请

    公开(公告)号:US20230137652A1

    公开(公告)日:2023-05-04

    申请号:US17977521

    申请日:2022-10-31

    Abstract: Disclosed are systems and methods including computing-processes executing machine-learning architectures for voice biometrics, in which the machine-learning architecture implements one or more language compensation functions. Embodiments include an embedding extraction engine (sometimes referred to as an “embedding extractor”) that extracts speaker embeddings and determines a speaker similarity score for determine or verifying the likelihood that speakers in different audio signals are the same speaker. The machine-learning architecture further includes a multi-class language classifier that determines a language likelihood score that indicates the likelihood that a particular audio signal includes a spoken language. The features and functions of the machine-learning architecture described herein may implement the various language compensation techniques to provide more accurate speaker recognition results, regardless of the language spoken by the speaker.

Patent Agency Ranking