专利检索 ipc:G10L21/0272 第 1 页

1.

发明授权
Sound source separation program, sound source separation method, and sound source separation device 有权

公开(公告)号：US12100413B2

公开(公告)日：2024-09-24

申请号：US17801614

申请日：2021-02-26

申请人： Tokyo Metropolitan Public University Corporation

发明人： Nobutaka Ono , Robin Scheibler

IPC分类号： H04R3/00 , G10L21/0272 , G10L21/028 , H04R1/40

CPC分类号： G10L21/028 , G10L21/0272 , H04R1/406 , H04R3/005

摘要： A sound source separation program causes a computer to acquire an acoustic signal, convert the acquired acoustic signal from a time region to a frequency region, and perform sound source separation on the acoustic signal converted to the frequency region by performing updating based on elementary row operation on a demixing matrix to iteratively minimize an objective function including a quadratic form of a separation vector and a determinant of the demixing matrix.

2.

发明授权
Voice filtering other speakers from calls and audio messages 有权

公开(公告)号：US12087297B2

公开(公告)日：2024-09-10

申请号：US17930822

申请日：2022-09-09

申请人： Google LLC

发明人： Matthew Sharifi , Victor Carbune

IPC分类号： G10L15/00 , G10L15/02 , G10L15/22 , G10L21/0208 , G10L21/0272 , G10L25/78 , G10L25/87

CPC分类号： G10L15/22 , G10L15/02 , G10L21/0208 , G10L21/0272 , G10L25/78 , G10L25/87

摘要： A method includes receiving a first instance of raw audio data corresponding to a voice-based command and receiving a second instance of the raw audio data corresponding to an utterance of audible contents for an audio-based communication spoken by a user. When a voice filtering recognition routine determines to activate voice filtering for at least the voice of the user, the method also includes obtaining a respective speaker embedding of the user and processing, using the respective speaker embedding, the second instance of the raw audio data to generate enhanced audio data for the audio-based communication that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user The method also includes executing.

3.

发明公开
SOUND SOURCE SEPARATION USING ANGULAR LOCATION 审中-公开

公开(公告)号：US20240274148A1

公开(公告)日：2024-08-15

申请号：US18645793

申请日：2024-04-25

申请人： Intel Corporation

发明人： Jesus Ferrer Romero , Hector Cordourier Maruri , Georg Stemmer , Willem Beltman

IPC分类号： G10L21/0272 , G10L25/30

CPC分类号： G10L21/0272 , G10L25/30

摘要： Systems and methods for audio source separation. A deep learning-based system uses an azimuth angle location to separate an audio signal originating from a selected location from other sound. Techniques are disclosed for steering a virtual direction of a microphone towards a selected speaker. A deep-learning based audio regression method, which can be implemented as a neural network, learns to separate out various speakers by leveraging spectral and spatial characteristics of all sources. The neural network can focus on multiple sources in multiple respective target directions, and cancel out other sounds. A user can choose which source to listen to. The network can use the time-domain signal and a frequency-domain signal to separate out the target signal and generate a separated audio output. The direction of the selected speaker relative to the microphone array can be input to the system as a vector.

4.

发明授权
Method, apparatus, device, and storage medium for speaker change point detection 有权

公开(公告)号：US12039981B2

公开(公告)日：2024-07-16

申请号：US18394143

申请日：2023-12-22

申请人： Beijing Youzhuju Network Technology Co., Ltd.

发明人： Linhao Dong , Zhiyun Fan , Zejun Ma

IPC分类号： G10L21/0272 , G10L17/04

CPC分类号： G10L17/04

摘要： A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors at a voice frame level of the target voice data; integrating and firing the speaker characterization vectors at the voice frame level of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations bounded by speaker change points in the target voice data; and determining a timestamp corresponding to the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data.

5.

发明授权
Automatic volume control for combined game and chat audio 有权

公开(公告)号：US12009794B2

公开(公告)日：2024-06-11

申请号：US18311833

申请日：2023-05-03

申请人： Voyetra Turtle Beach, Inc.

发明人： Richard Kulavik , Shobha Devi Kuruba Buchannagari , Carmine Bonanno

IPC分类号： H03G3/32 , A63F13/215 , A63F13/54 , A63F13/87 , G10L21/0272 , G10L25/84 , H03G3/20 , H03G3/30 , H03G3/34 , H03G5/16 , H04R1/10

CPC分类号： H03G3/32 , A63F13/215 , A63F13/54 , A63F13/87 , G10L21/0272 , G10L25/84 , H03G3/20 , H03G3/3005 , H03G3/3089 , H03G3/342 , H03G5/16 , H03G5/165 , H04R1/1091

摘要： A system comprising audio processing circuitry is provided. The audio processing circuitry is operable to receive audio signals. The audio processing circuitry is operable to process the audio signals to detect strength of a chat component of the audio signals and strength of a game component of the audio signals. The audio processing circuitry is operable to automatically control a volume setting based on one or both of: the detected strength of the chat component, and the detected strength of the game component. The combined-game-and-chat audio signals may comprise a left channel signal and a right channel signal. The processing of the combined-game-and-chat audio signals may comprise measuring strength of a vocal-band signal component that is common to the left channel signal and the right channel signal.

6.

发明授权
Background audio identification for speech disambiguation 有权

公开(公告)号：US12002452B2

公开(公告)日：2024-06-04

申请号：US18069663

申请日：2022-12-21

申请人： Google LLC

发明人： Jason Sanders , Gabriel Taubman , John J. Lee

IPC分类号： G10L15/22 , G06F16/683 , G10L15/08 , G10L15/18 , G10L15/26 , G10L21/0272 , G10L25/48 , H04M3/493 , G10L21/0208

CPC分类号： G10L15/08 , G06F16/685 , G10L15/1815 , G10L15/22 , G10L15/26 , G10L21/0272 , G10L25/48 , H04M3/4936 , G10L2015/225 , G10L21/0208 , H04M2201/40 , H04M2203/352

摘要： Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.

7.

发明公开
REAL-TIME PROVISION OF GUIDANCE TO SALES-FOCUSED AGENTS OF A CONTACT CENTER BASED ON IDENTIFIABLE BACKGROUND SOUNDS 审中-公开

公开(公告)号：US20240177711A1

公开(公告)日：2024-05-30

申请号：US18070739

申请日：2022-11-29

申请人： Avaya Management L.P.

发明人： Rusty Gerald Nelson , Paul Roller Michaelis , Kevin Archer , Gregory Paul Schin

IPC分类号： G10L15/22 , G06Q30/015 , G10L21/0272 , G10L25/63 , H04M3/51

CPC分类号： G10L15/22 , G06Q30/015 , G10L21/0272 , G10L25/63 , H04M3/5175 , H04M2203/402

摘要： The technology disclosed herein enables provision of sales guidance to an agent on a real-time communication session based on background sound identified during the communication session. In a particular embodiment, a method includes receiving audio from a first endpoint operated by a first user. The audio is received over a real-time communication session established between the first endpoint and a second endpoint operated by an agent of a contact center. The method further includes identifying sound other than a voice of the first user from the audio and determining a characteristic of the first user indicated by the sound. During the communication session, the method includes providing sales guidance to the agent based on the characteristic.

8.

发明公开
APPARATUS AND METHOD FOR VIDEO-AUDIO PROCESSING, AND PROGRAM FOR SEPARATING AN OBJECT SOUND CORRESPONDING TO A SELECTED VIDEO OBJECT 审中-公开

公开(公告)号：US20240146867A1

公开(公告)日：2024-05-02

申请号：US18407825

申请日：2024-01-09

申请人： Sony Group Corporation

发明人： Hiroyuki Honma , Yuki Yamamoto

IPC分类号： H04N5/92 , G06V20/40 , G06V40/16 , G10L19/00 , G10L19/008 , G10L21/0272 , G11B27/30 , H04N9/802 , H04N19/46 , H04R1/40 , H04R3/00

CPC分类号： H04N5/9202 , G06V20/46 , G06V40/16 , G06V40/161 , G10L19/00 , G10L19/008 , G10L21/0272 , G11B27/3081 , H04N9/802 , H04N19/46 , H04R1/40 , H04R3/00 , G06F2218/22

摘要： The present technique relates to an apparatus and a method for video-audio processing, and a program each of which enables a desired object sound to be more simply and accurately separated.
A video-audio processing apparatus includes a display control portion configured to cause a video object based on a video signal to be displayed; an object selecting portion configured to select the predetermined video object from the one video object or among a plurality of the video objects; and an extraction portion configured to extract an audio signal of the video object selected by the object selecting portion as an audio object signal. The present technique can be applied to a video-audio processing apparatus.

9.

发明授权
Method for detecting speech in audio data 有权

公开(公告)号：US11967340B2

公开(公告)日：2024-04-23

申请号：US18340767

申请日：2023-06-23

申请人： ActionPower Corp.

发明人： Subong Choi , Dongchan Shin , Jihwa Lee

IPC分类号： G10L25/78 , G10L21/0272 , G10L25/18 , G10L25/30

CPC分类号： G10L25/78 , G10L21/0272 , G10L25/18 , G10L25/30

摘要： Disclosed is a method for detecting a voice from audio data, performed by a computing device according to an exemplary embodiment of the present disclosure. The method includes obtaining audio data; generating image data based on a spectrum of the obtained audio data; analyzing the generated image data by utilizing a pre-trained neural network model; and determining whether an automated response system (ARS) voice is included in the audio data, based on the analysis of the image data.

10.

发明授权
Input/output mode control for audio processing 有权

公开(公告)号：US11929088B2

公开(公告)日：2024-03-12

申请号：US15990559

申请日：2018-05-25

申请人： SYNAPTICS INCORPORATED

发明人： Randall Deetz , Trausti Thormundsson , Stuart Whitfield Hutson , Thorarinn Vikingur Sveinsson , Yair Kerner

IPC分类号： G10L21/0364 , G06F3/16 , G10L15/22 , G10L21/0208 , G10L21/0232 , G10L21/0272 , H04M3/56 , G10L15/26

CPC分类号： G10L21/0364 , G06F3/162 , G10L21/0208 , G10L21/0232 , H04M3/568 , G10L2015/228 , G10L15/26 , G10L21/0272

摘要： Systems and methods provide input and output mode control for audio processing on a user device. Audio processing may be configured by monitoring audio activity on a device having at least one microphone and a digital audio processing unit, collecting information from the monitoring of the activity, including an identification of at least one application utilizing audio processing, and determining a context for the audio processing, the context including at least one of a hardware, software, audio signal and/or environmental context. An audio signal processing configuration is determined based on the application and determined context, an associated audio signal processing mode is selected, and an optimized audio signal generated.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类