ROBUST SPREAD-SPECTRUM SPEECH WATERMARKING USING LINEAR PREDICTION AND DEEP SPECTRAL SHAPING

    公开(公告)号:US20250095662A1

    公开(公告)日:2025-03-20

    申请号:US18883681

    申请日:2024-09-12

    Abstract: Embodiments disclosed herein include software processes executed by a computer for encoding and decoding watermarks for a speech signal in a call signal communicated via telephony channels. An encoder uses Linear Predictive Coding (LPC) to analyzes the call signal's spectral envelope and embeds the watermark into the LPC log-spectrum of the speech signal of the call signal. The encoder may reduce the watermark's strength at a formant peak of the speech signal, balancing the watermark's robustness and detectability. A deep decoder includes a neural network architecture trained on watermarked and watermark-free speech signals having various types of degradation to extract a feature vector of a call signal and compute a watermark detection score for one or more frames or for the call signal. At inference time, the deep decoder detects the watermark when the watermark detection score satisfies a detection threshold.

    VOICE MODIFICATION DETECTION USING PHYSICAL MODELS OF SPEECH PRODUCTION

    公开(公告)号:US20230015189A1

    公开(公告)日:2023-01-19

    申请号:US17953156

    申请日:2022-09-26

    Abstract: A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced speech by a random noise and a combination of the pulse train and the random noise is passed through an auto-regressive filter that emulates the human vocal tract. The computer leverages the fact that intentional modification of human voice introduces errors to source-filter model or any other physical model of speech production. The computer may identify anomalies in the physical model to generate a voice modification score for an audio signal. The voice modification score may indicate a degree of abnormality of human voice in the audio signal.

    Voice modification detection using physical models of speech production

    公开(公告)号:US11495244B2

    公开(公告)日:2022-11-08

    申请号:US16375785

    申请日:2019-04-04

    Abstract: A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced speech by a random noise and a combination of the pulse train and the random noise is passed through an auto-regressive filter that emulates the human vocal tract. The computer leverages the fact that intentional modification of human voice introduces errors to source-filter model or any other physical model of speech production. The computer may identify anomalies in the physical model to generate a voice modification score for an audio signal. The voice modification score may indicate a degree of abnormality of human voice in the audio signal.

    PRESENTATION ATTACKS IN REVERBERANT CONDITIONS

    公开(公告)号:US20240311474A1

    公开(公告)日:2024-09-19

    申请号:US18598595

    申请日:2024-03-07

    Abstract: Embodiments include a computing device that executes software routines and/or one or more machine-learning architectures including obtaining training audio signals having corresponding training impulse responses associated with reverberation degradation, training a machine-learning model of a presentation attack detection engine to generate one or more acoustic parameters by executing the presentation attack detection engine using the training impulse responses of the training audio signals and a loss function, obtaining an audio signal having an acoustic impulse response associated with reverberation degradation caused by one or more rooms, generating the one or more acoustic parameters for the audio signal by executing the machine-learning model using the audio signal as input, and generating an attack score for the audio signal based upon the one or more parameters generated by the machine-learning model.

    Joint estimation of acoustic parameters from single-microphone speech

    公开(公告)号:US12087319B1

    公开(公告)日:2024-09-10

    申请号:US17079082

    申请日:2020-10-23

    CPC classification number: G10L25/30 G06N3/048 G06N3/08

    Abstract: Embodiments described herein provide for end-to-end joint determination of degradation parameter scores for certain types of degradation. Degradation parameters include degradation describing additive noise and multiplicative noise such as Signal-to-Noise Ratio (SNR), reverberation time (T60), and Direct-to-Reverberant Ratio (DRR). Various neural network architectures are described such that the inherent interplay between the degradation parameters is considered in both the degradation parameter score and degradation score determination. The neural network architectures are trained according to computer generated audio datasets.

Patent Agency Ranking