AUDIOVISUAL DEEPFAKE DETECTION
    1.
    发明申请

    公开(公告)号:US20250037507A1

    公开(公告)日:2025-01-30

    申请号:US18919049

    申请日:2024-10-17

    Abstract: The embodiments execute machine-learning architectures for biometric-based identity recognition (e.g., speaker recognition, facial recognition) and deepfake detection (e.g., speaker deepfake detection, facial deepfake detection). The machine-learning architecture includes layers defining multiple scoring components, including sub-architectures for speaker deepfake detection, speaker recognition, facial deepfake detection, facial recognition, and lip-sync estimation engine. The machine-learning architecture extracts and analyzes various types of low-level features from both audio data and visual data, combines the various scores, and uses the scores to determine the likelihood that the audiovisual data contains deepfake content and the likelihood that a claimed identity of a person in the video matches to the identity of an expected or enrolled person. This enables the machine-learning architecture to perform identity recognition and verification, and deepfake detection, in an integrated fashion, for both audio data and visual data.

    BEHAVIORAL BIOMETRICS USING KEYPRESS TEMPORAL INFORMATION

    公开(公告)号:US20240169040A1

    公开(公告)日:2024-05-23

    申请号:US18515128

    申请日:2023-11-20

    CPC classification number: G06F21/316

    Abstract: Embodiments include a computing device that executes software routines and/or one or more machine-learning architectures including a neural network-based embedding extraction system that to produce an embedding vector representing a user's behavior's keypresses, where the system extracts the behaviorprint embedding vector using the keypress features that the system references later for authenticating users. Embodiments may extract and evaluate keypress features, such as keypress sequences, keypress pressure or volume, and temporal keypress features, such as the duration of keypresses and the interval between keypresses, among others. Some embodiments employ a deep neural network architecture that generates a behaviorprint embedding vector representation of the keypress duration and interval features that is used for enrollment and at inference time to authenticate users.

    SYSTEMS AND METHODS OF SPEAKER-INDEPENDENT EMBEDDING FOR IDENTIFICATION AND VERIFICATION FROM AUDIO

    公开(公告)号:US20210280171A1

    公开(公告)日:2021-09-09

    申请号:US17192464

    申请日:2021-03-04

    Abstract: Embodiments described herein provide for audio processing operations that evaluate characteristics of audio signals that are independent of the speaker's voice. A neural network architecture trains and applies discriminatory neural networks tasked with modeling and classifying speaker-independent characteristics. The task-specific models generate or extract feature vectors from input audio data based on the trained embedding extraction models. The embeddings from the task-specific models are concatenated to form a deep-phoneprint vector for the input audio signal. The DP vector is a low dimensional representation of the each of the speaker-independent characteristics of the audio signal and applied in various downstream operations.

    END-TO-END SPEAKER RECOGNITION USING DEEP NEURAL NETWORK

    公开(公告)号:US20190392842A1

    公开(公告)日:2019-12-26

    申请号:US16536293

    申请日:2019-08-08

    Abstract: The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

    DEEPFAKE DETECTION
    10.
    发明公开
    DEEPFAKE DETECTION 审中-公开

    公开(公告)号:US20240355334A1

    公开(公告)日:2024-10-24

    申请号:US18388457

    申请日:2023-11-09

    CPC classification number: G10L17/06

    Abstract: Disclosed are systems and methods including software processes executed by a server that detect audio-based synthetic speech (“deepfakes”) in a call conversation. The server applies an NLP engine to transcribe call audio and analyze the text for anomalous patterns to detect synthetic speech. Additionally or alternatively, the server executes a voice “liveness” detection system for detecting machine speech, such as synthetic speech or replayed speech. The system performs phrase repetition detection, background change detection, and passive voice liveness detection in call audio signals to detect liveness of a speech utterance. An automated model update module allows the liveness detection model to adapt to new types of presentation attacks, based on the human provided feedback.

Patent Agency Ranking