-
公开(公告)号:US20240363125A1
公开(公告)日:2024-10-31
申请号:US18646493
申请日:2024-04-25
发明人: Elie KHOURY , Ganesh SIVARAMAN , Tianxiang CHEN , Nikolay GAUBITCH , David LOONEY , Amit GUPTA , Vijay BALASUBRAMANIYAN , Nicholas KLEIN , Anthony STANKUS
CPC分类号: G10L17/26 , G10L17/02 , G10L17/04 , G10L25/60 , H04M3/2281 , H04M3/5183 , H04M2201/405
摘要: Disclosed are systems and methods including software processes executed by a server that detect audio-based synthetic speech (“deepfakes”) in a call conversation. Embodiments include systems and methods for detecting fraudulent presentation attacks using multiple functional engines that implement various fraud-detection techniques, to produce calibrated scores and/or fused scores. A computer may, for example, evaluate the audio quality of speech signals within audio signals, where speech signals contain the speech portions having speaker utterances.
-
公开(公告)号:US12020724B2
公开(公告)日:2024-06-25
申请号:US17841794
申请日:2022-06-16
申请人: Clearspeed Inc.
发明人: James A. Kane
CPC分类号: G10L25/60 , G10L25/63 , G10L25/84 , H04M3/2227 , H04M3/2236
摘要: The present disclosure provides methods and systems that may be used for providing quality control for audio samples. The audio samples may be speech samples of a user. The user may be participating in an audio interview.
-
3.
公开(公告)号:US11967332B2
公开(公告)日:2024-04-23
申请号:US17477592
申请日:2021-09-17
IPC分类号: G10L21/0232 , G10L25/60 , G10L25/75
CPC分类号: G10L21/0232 , G10L25/60 , G10L25/75
摘要: A computer-implemented method for correcting muffled speech caused by facial coverings is disclosed. The computer-implemented method includes monitoring a user's speech for speech distortion. The computer-implemented method further includes determining that the user's speech is distorted. The computer-implemented method further includes determining that a cause of the user's speech distortion is based, at least in part, on a presence of a particular type of facial covering. The computer-implemented method further includes automatically correcting the speech distortion of the user based, at least in part, on the particular type of facial covering causing the speech distortion.
-
公开(公告)号:US20240127848A1
公开(公告)日:2024-04-18
申请号:US18079342
申请日:2022-12-12
发明人: Carl Lorenz DIENER
IPC分类号: G10L25/60 , G10L19/005 , G10L25/30 , G10L25/69
CPC分类号: G10L25/60 , G10L19/005 , G10L25/30 , G10L25/69 , H04L41/0681
摘要: This document relates to training and employing a quality estimation model. One example includes a method or technique that can be performed on a computing device. The method or technique can include providing degraded audio signals to one or more packet loss concealment models, and obtaining enhanced audio signals output by the one or more packet loss concealment models. The method or technique can also include obtaining quality labels for the enhanced audio signals and training a quality estimation model to estimate audio signal quality based at least on the enhanced audio signals and the quality labels.
-
公开(公告)号:US20240119958A1
公开(公告)日:2024-04-11
申请号:US18488623
申请日:2023-10-17
申请人: Google LLC
发明人: Anshul Kothari , Gaurav Bhaya , Tarun Jain
CPC分类号: G10L25/60 , G06N20/00 , G10L25/03 , H04L12/282 , G10L2015/226
摘要: Coordinating signal processing among computing devices in a voice-driven computing environment is provided. A first and second digital assistant can detect an input audio signal, perform a signal quality check, and provide indications that the first and second digital assistants are operational to process the input audio signal. A system can select the first digital assistant for further processing. The system can receive, from the first digital assistant, data packets including a command. The system can generate, for a network connected device selected from a plurality of network connected devices, an action data structure based on the data packets, and transmit the action data structure to the selected network connected device.
-
6.
公开(公告)号:US20240119956A1
公开(公告)日:2024-04-11
申请号:US17992473
申请日:2022-11-22
IPC分类号: G10L25/18 , G10L21/007 , G10L21/0232 , G10L25/60
CPC分类号: G10L25/18 , G10L21/007 , G10L21/0232 , G10L25/60
摘要: A computer implemented data augmentation method comprising receiving a dataset to be processed and, upon the received dataset being unclassified into classes, performing a clustering algorithm to partition the dataset whereby clusters formed are interpreted as the signal classes. The method further includes forming a sample dataset by gathering, for each class of a plurality of classes, at least two sample signals then applying a discrete Fourier transform (DFT) to each sample signal of the sample dataset. The method includes computing frequency parameters of each sample signal to determine, based on a spectral coherence threshold, frequency bands: relevant bands that characterizes a class. The method further includes injecting random noise in a phase spectrum of the non-relevant frequency bands of each sample signal of the sample dataset, to generate a set of augmented sample signals, and applying an inverse DFT, in each of the generated augmented sample signals.
-
公开(公告)号:US11924368B2
公开(公告)日:2024-03-05
申请号:US17608823
申请日:2019-05-07
发明人: Sachiko Kurihara , Noboru Harada
IPC分类号: H04M1/24 , G10L21/0232 , G10L25/60 , G10L25/84 , H04M3/08 , H04M3/22 , H04M3/26 , G10L21/0208
CPC分类号: H04M3/2236 , G10L21/0232 , G10L25/60 , G10L25/84 , H04M3/26 , G10L2021/02082
摘要: To improve accuracy of an evaluation in an acoustic quality evaluation test performed by comparing an evaluation target sound and a reference sound. A data correction apparatus 3 compares, in a call performed between a near-end terminal 1 and a far-end terminal 2, an evaluation target sound in which a voice output from the near-end terminal 1 is recorded and a reference sound in which a voice spoken by a call partner using the far-end terminal 2 to correct test data used in a listening test for evaluating acoustic quality of the call. A correction target determination unit 31 determines, as a correction target section, a voiced section that does not include the voice of the call partner detected from an acoustic signal representing the reference sound. A correction execution unit 32 updates the correction target section of the acoustic signal representing the reference sound with a non-voice signal predetermined.
-
8.
公开(公告)号:US20240046950A1
公开(公告)日:2024-02-08
申请号:US17881355
申请日:2022-08-04
CPC分类号: G10L21/034 , G06T7/70 , G06F3/167 , G10L15/063 , G10L25/60 , G10L15/25 , G06T2207/30201
摘要: An electronic device includes an imager capturing one or more images of a subject engaging the electronic device and an audio input receiving acoustic signals having audible frequencies from the mouth of the subject engaging the electronic device. One or more processors determine from the one or more images of the subject whether the mouth of the subject is oriented on-axis relative to the audio input or off-axis relative to the audio input. The one or more processors adjust a gain of the audio input associated with a subset of the audible frequencies when the mouth of the subject is oriented off-axis relative to the audio input.
-
公开(公告)号:US11887617B2
公开(公告)日:2024-01-30
申请号:US17260684
申请日:2019-05-31
发明人: Ki Hoon Shin , Jonguk Yoo , Sangmoon Lee
IPC分类号: G10L21/0216 , G10L21/0272 , G10L25/18 , G10L25/60
CPC分类号: G10L21/0216 , G10L21/0272 , G10L25/18 , G10L25/60 , G10L2021/02166
摘要: An electronic device for speech recognition includes a multi-channel microphone array required for remote speech recognition. The electronic device improves efficiency and performance of speech recognition of the electronic device in a space where noise other than speech to be recognized exists. A control method includes receiving a plurality of audio signals output from a plurality of sources through a plurality of microphones and analyzing the audio signals and obtaining information on directions in which the audio signals are input and information on input times of the audio signals. A target source for speech recognition among the plurality of sources is determined on the basis of the obtained information on the directions in which the plurality of audio signals are input, and the obtained information on the input times of the plurality of audio signals, and an audio signal obtained from the determined target source is processed.
-
公开(公告)号:US11869513B2
公开(公告)日:2024-01-09
申请号:US17142775
申请日:2021-01-06
发明人: Iván López Espejo , Santiago Prieto Calero , Ana Iriarte Ruiz , David Roncal Redín , Miguel Ángel Sánchez Yoldi , Eduardo Azanza Ladrón
摘要: Methods of authenticating a user or speaker are provided. These methods include obtaining an input speech signal and user credentials identifying the user or speaker. The input speech signal includes a single-channel signal or a multi-channel speech signal. The methods further include extracting a speech voiceprint from the input speech signal, and retrieving a reference voiceprint associated to the user credentials. The methods still further include determining a voiceprint correspondence between the speech voiceprint and the reference voiceprint, and authenticating the user or speaker depending on said voiceprint correspondence. The methods yet further include updating the reference voiceprint depending on the speech voiceprint corresponding to the authenticated user or speaker. Computer programs, systems and computing systems are also provided which are suitable for performing said methods of authenticating a user or speaker.
-
-
-
-
-
-
-
-
-