-
1.
公开(公告)号:US12100413B2
公开(公告)日:2024-09-24
申请号:US17801614
申请日:2021-02-26
发明人: Nobutaka Ono , Robin Scheibler
IPC分类号: H04R3/00 , G10L21/0272 , G10L21/028 , H04R1/40
CPC分类号: G10L21/028 , G10L21/0272 , H04R1/406 , H04R3/005
摘要: A sound source separation program causes a computer to acquire an acoustic signal, convert the acquired acoustic signal from a time region to a frequency region, and perform sound source separation on the acoustic signal converted to the frequency region by performing updating based on elementary row operation on a demixing matrix to iteratively minimize an objective function including a quadratic form of a separation vector and a determinant of the demixing matrix.
-
公开(公告)号:US12087297B2
公开(公告)日:2024-09-10
申请号:US17930822
申请日:2022-09-09
申请人: Google LLC
发明人: Matthew Sharifi , Victor Carbune
IPC分类号: G10L15/00 , G10L15/02 , G10L15/22 , G10L21/0208 , G10L21/0272 , G10L25/78 , G10L25/87
CPC分类号: G10L15/22 , G10L15/02 , G10L21/0208 , G10L21/0272 , G10L25/78 , G10L25/87
摘要: A method includes receiving a first instance of raw audio data corresponding to a voice-based command and receiving a second instance of the raw audio data corresponding to an utterance of audible contents for an audio-based communication spoken by a user. When a voice filtering recognition routine determines to activate voice filtering for at least the voice of the user, the method also includes obtaining a respective speaker embedding of the user and processing, using the respective speaker embedding, the second instance of the raw audio data to generate enhanced audio data for the audio-based communication that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user The method also includes executing.
-
公开(公告)号:US20240274148A1
公开(公告)日:2024-08-15
申请号:US18645793
申请日:2024-04-25
申请人: Intel Corporation
IPC分类号: G10L21/0272 , G10L25/30
CPC分类号: G10L21/0272 , G10L25/30
摘要: Systems and methods for audio source separation. A deep learning-based system uses an azimuth angle location to separate an audio signal originating from a selected location from other sound. Techniques are disclosed for steering a virtual direction of a microphone towards a selected speaker. A deep-learning based audio regression method, which can be implemented as a neural network, learns to separate out various speakers by leveraging spectral and spatial characteristics of all sources. The neural network can focus on multiple sources in multiple respective target directions, and cancel out other sounds. A user can choose which source to listen to. The network can use the time-domain signal and a frequency-domain signal to separate out the target signal and generate a separated audio output. The direction of the selected speaker relative to the microphone array can be input to the system as a vector.
-
公开(公告)号:US12039981B2
公开(公告)日:2024-07-16
申请号:US18394143
申请日:2023-12-22
发明人: Linhao Dong , Zhiyun Fan , Zejun Ma
IPC分类号: G10L21/0272 , G10L17/04
CPC分类号: G10L17/04
摘要: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors at a voice frame level of the target voice data; integrating and firing the speaker characterization vectors at the voice frame level of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations bounded by speaker change points in the target voice data; and determining a timestamp corresponding to the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data.
-
公开(公告)号:US12009794B2
公开(公告)日:2024-06-11
申请号:US18311833
申请日:2023-05-03
IPC分类号: H03G3/32 , A63F13/215 , A63F13/54 , A63F13/87 , G10L21/0272 , G10L25/84 , H03G3/20 , H03G3/30 , H03G3/34 , H03G5/16 , H04R1/10
CPC分类号: H03G3/32 , A63F13/215 , A63F13/54 , A63F13/87 , G10L21/0272 , G10L25/84 , H03G3/20 , H03G3/3005 , H03G3/3089 , H03G3/342 , H03G5/16 , H03G5/165 , H04R1/1091
摘要: A system comprising audio processing circuitry is provided. The audio processing circuitry is operable to receive audio signals. The audio processing circuitry is operable to process the audio signals to detect strength of a chat component of the audio signals and strength of a game component of the audio signals. The audio processing circuitry is operable to automatically control a volume setting based on one or both of: the detected strength of the chat component, and the detected strength of the game component. The combined-game-and-chat audio signals may comprise a left channel signal and a right channel signal. The processing of the combined-game-and-chat audio signals may comprise measuring strength of a vocal-band signal component that is common to the left channel signal and the right channel signal.
-
公开(公告)号:US12002452B2
公开(公告)日:2024-06-04
申请号:US18069663
申请日:2022-12-21
申请人: Google LLC
发明人: Jason Sanders , Gabriel Taubman , John J. Lee
IPC分类号: G10L15/22 , G06F16/683 , G10L15/08 , G10L15/18 , G10L15/26 , G10L21/0272 , G10L25/48 , H04M3/493 , G10L21/0208
CPC分类号: G10L15/08 , G06F16/685 , G10L15/1815 , G10L15/22 , G10L15/26 , G10L21/0272 , G10L25/48 , H04M3/4936 , G10L2015/225 , G10L21/0208 , H04M2201/40 , H04M2203/352
摘要: Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.
-
7.
公开(公告)号:US20240177711A1
公开(公告)日:2024-05-30
申请号:US18070739
申请日:2022-11-29
IPC分类号: G10L15/22 , G06Q30/015 , G10L21/0272 , G10L25/63 , H04M3/51
CPC分类号: G10L15/22 , G06Q30/015 , G10L21/0272 , G10L25/63 , H04M3/5175 , H04M2203/402
摘要: The technology disclosed herein enables provision of sales guidance to an agent on a real-time communication session based on background sound identified during the communication session. In a particular embodiment, a method includes receiving audio from a first endpoint operated by a first user. The audio is received over a real-time communication session established between the first endpoint and a second endpoint operated by an agent of a contact center. The method further includes identifying sound other than a voice of the first user from the audio and determining a characteristic of the first user indicated by the sound. During the communication session, the method includes providing sales guidance to the agent based on the characteristic.
-
公开(公告)号:US20240146867A1
公开(公告)日:2024-05-02
申请号:US18407825
申请日:2024-01-09
发明人: Hiroyuki Honma , Yuki Yamamoto
IPC分类号: H04N5/92 , G06V20/40 , G06V40/16 , G10L19/00 , G10L19/008 , G10L21/0272 , G11B27/30 , H04N9/802 , H04N19/46 , H04R1/40 , H04R3/00
CPC分类号: H04N5/9202 , G06V20/46 , G06V40/16 , G06V40/161 , G10L19/00 , G10L19/008 , G10L21/0272 , G11B27/3081 , H04N9/802 , H04N19/46 , H04R1/40 , H04R3/00 , G06F2218/22
摘要: The present technique relates to an apparatus and a method for video-audio processing, and a program each of which enables a desired object sound to be more simply and accurately separated.
A video-audio processing apparatus includes a display control portion configured to cause a video object based on a video signal to be displayed; an object selecting portion configured to select the predetermined video object from the one video object or among a plurality of the video objects; and an extraction portion configured to extract an audio signal of the video object selected by the object selecting portion as an audio object signal. The present technique can be applied to a video-audio processing apparatus.-
公开(公告)号:US11967340B2
公开(公告)日:2024-04-23
申请号:US18340767
申请日:2023-06-23
申请人: ActionPower Corp.
发明人: Subong Choi , Dongchan Shin , Jihwa Lee
IPC分类号: G10L25/78 , G10L21/0272 , G10L25/18 , G10L25/30
CPC分类号: G10L25/78 , G10L21/0272 , G10L25/18 , G10L25/30
摘要: Disclosed is a method for detecting a voice from audio data, performed by a computing device according to an exemplary embodiment of the present disclosure. The method includes obtaining audio data; generating image data based on a spectrum of the obtained audio data; analyzing the generated image data by utilizing a pre-trained neural network model; and determining whether an automated response system (ARS) voice is included in the audio data, based on the analysis of the image data.
-
公开(公告)号:US11929088B2
公开(公告)日:2024-03-12
申请号:US15990559
申请日:2018-05-25
发明人: Randall Deetz , Trausti Thormundsson , Stuart Whitfield Hutson , Thorarinn Vikingur Sveinsson , Yair Kerner
IPC分类号: G10L21/0364 , G06F3/16 , G10L15/22 , G10L21/0208 , G10L21/0232 , G10L21/0272 , H04M3/56 , G10L15/26
CPC分类号: G10L21/0364 , G06F3/162 , G10L21/0208 , G10L21/0232 , H04M3/568 , G10L2015/228 , G10L15/26 , G10L21/0272
摘要: Systems and methods provide input and output mode control for audio processing on a user device. Audio processing may be configured by monitoring audio activity on a device having at least one microphone and a digital audio processing unit, collecting information from the monitoring of the activity, including an identification of at least one application utilizing audio processing, and determining a context for the audio processing, the context including at least one of a hardware, software, audio signal and/or environmental context. An audio signal processing configuration is determined based on the application and determined context, an associated audio signal processing mode is selected, and an optimized audio signal generated.
-
-
-
-
-
-
-
-
-