专利检索 ipc:"G10L21/02" 第 1 页

1.

发明授权
Pre-conditioning audio for echo cancellation in machine perception 有权

公开(公告)号：US12080317B2

公开(公告)日：2024-09-03

申请号：US17639317

申请日：2020-08-27

申请人： DOLBY LABORATORIES LICENSING CORPORATION

发明人： Hadis Nosrati , Glenn N. Dickins , Nicholas Luke Appleton

IPC分类号： G10L15/20 , G10L21/02 , G10L21/0208 , G10L21/0316

CPC分类号： G10L21/0316 , G10L15/20 , G10L21/0208 , G10L2021/02082

摘要： An apparatus and method of pre-conditioning audio for machine perception. Machine perception differs from human perception, and different processing parameters are used for machine perception applications (e.g., speech to text processing) as compared to those used for human perception applications (e.g., voice communications). These different parameters may result in pre-conditioned audio that is worsened for human perception yet improved for machine perception.

2.

发明授权
Heliumspeech unscrambling method and system for saturation diving based on multi-objective optimization 有权

公开(公告)号：US12039988B1

公开(公告)日：2024-07-16

申请号：US18424695

申请日：2024-01-26

申请人： Nantong University

发明人： Shibing Zhang , Jianrong Wu

IPC分类号： G10L21/02 , G10L15/06 , G10L15/20 , G10L25/51

CPC分类号： G10L21/02 , G10L15/063 , G10L15/20 , G10L25/51 , G10L2015/0631

摘要： The present application discloses a method and a system for saturation diving heliumspeech unscrambling based on multi-objective optimization. In a system including a diver and a filter at least, a working language phonetic symbol library and a common working word library for divers are constructed. The divers read them one by one, and a phonetic symbol standard speech library, a phonetic symbol heliumspeech library and a common working word speech library are generated. The filter uses the multi-objective optimization algorithm to design its impulse response coefficients, corrects and unscrambles the tagged and sampled heliumspeech signal word by word, and continuously updates the impulse response coefficients to complete the perfect heliumspeech unscrambling.

3.

发明授权
Noise floor estimation and noise reduction 有权

公开(公告)号：US12033649B2

公开(公告)日：2024-07-09

申请号：US17793539

申请日：2021-01-18

申请人： DOLBY INTERNATIONAL AB

发明人： Giulio Cengarle , Antonio Mateos Sole , Davide Scaini

IPC分类号： G10L21/02

CPC分类号： G10L21/02

摘要： Embodiments are disclosed for noise floor estimation and noise reduction, In an embodiment, a method comprises: obtaining an audio signal; dividing the audio signal into a plurality of buffers; determining time-frequency samples for each buffer of the audio signal; for each buffer and for each frequency, determining a median (or mean) and a measure of an amount of variation of energy based on the samples in the buffer and samples in neighboring buffers that together span a specified time range of the audio signal; combining the median (or mean) and the measure of the amount of variation of energy into a cost function; for each frequency: determining a signal energy of a particular buffer of the audio signal that corresponds to a minimum value of the cost function; selecting the signal energy as the estimated noise floor of the audio signal; and reducing, using the estimated noise floor, noise in the audio signal.

4.

发明公开
AUDIO QUALITY CONVERSION DEVICE AND CONTROL METHOD THEREFOR 审中-公开

公开(公告)号：US20240212699A1

公开(公告)日：2024-06-27

申请号：US18568678

申请日：2022-06-09

申请人： COCHL.INC.

发明人： Yoon Chang HAN , Su Bin LEE , Jeong Soo PARK , Il Young JEONG , Don Moon LEE , Hyun Gui LIM

IPC分类号： G10L21/02 , G10L25/30

CPC分类号： G10L21/02 , G10L25/30

摘要： An audio quality conversion device according to the present invention includes: a control unit having, mounted therein, an artificial neural network that learns using a plurality of pieces of audio data recorded in recording environments differing with respect to a predetermined audio event, and environmental data related to the recording environments corresponding to respective audio data; and an audio input unit receiving outside sounds to generate audio recording data, wherein the control unit converts, on the basis of a learning result of the artificial neural network, the audio recording data generated by means of the audio input unit.

5.

发明授权
Audio encoder for encoding an audio signal, method for encoding an audio signal and computer program under consideration of a detected peak spectral region in an upper frequency band 有权

公开(公告)号：US12014747B2

公开(公告)日：2024-06-18

申请号：US18308293

申请日：2023-04-27

申请人： Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.

发明人： Markus Multrus , Christian Neukam , Markus Schnell , Benjamin Schubert

IPC分类号： G10L19/26 , G10L19/02 , G10L19/028 , G10L19/03 , G10L19/032 , G10L19/04 , G10L19/12 , G10L19/16 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L21/038 , G10L25/15 , G10L25/18

CPC分类号： G10L19/265 , G10L19/0204 , G10L19/03 , G10L19/032 , G10L19/12 , G10L19/16 , G10L19/26 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L25/15 , G10L25/18 , G10L19/02 , G10L19/028 , G10L19/04 , G10L21/038

摘要： An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.

6.

发明公开
RELEVANCE BASED SOURCE SELECTION FOR FAR-FIELD VOICE SYSTEMS 审中-公开

公开(公告)号：US20240194189A1

公开(公告)日：2024-06-13

申请号：US18077180

申请日：2022-12-07

申请人： Avago Technologies International Sales Pte. Limited

发明人： Qutubuddin SAIFEE , Raghuram ANNADANA , Sunil Kashinath SHRIPAD , Manoj SINGHAL

IPC分类号： G10L15/08 , G10L15/02 , G10L21/02

CPC分类号： G10L15/08 , G10L15/02 , G10L21/02

摘要： An electronic device includes a far-field voice (FFV) processor including a source selection module. The source selection module receives a set of audio signals and determines, for each audio stream, whether the audio stream is relevant to an application. The source selection module receives several separate probability computations, with each probability computation providing a probability of the presence of a particular characteristic. Additionally, the source selection module receives one or more applications as well relevance information (e.g., one or relevant characteristics) associated with the one or applications. The source selection module can used respective probabilities to determine if one or more characteristics are present in an audio signal, and compare the characteristic(s) to the relevance information for the application. Using this information, the source selection module can determine, for each audio signal, to which respective application the audio stream is relevant.

7.

发明授权
Mixed speech recognition method and apparatus, and computer-readable storage medium 有权

公开(公告)号：US11996091B2

公开(公告)日：2024-05-28

申请号：US16989844

申请日：2020-08-10

申请人： TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

发明人： Jun Wang , Jie Chen , Dan Su , Dong Yu

IPC分类号： G10L15/20 , G10L15/02 , G10L15/16 , G10L15/22 , G10L17/06 , G10L21/02 , G10L21/0272 , G10L21/0208

CPC分类号： G10L15/20 , G10L15/02 , G10L15/16 , G10L15/22 , G10L17/06 , G10L21/02 , G10L21/0272 , G10L2015/223 , G10L2021/02087

摘要： A mixed speech recognition method, a mixed speech recognition apparatus, and a computer-readable storage medium are provided. The mixed speech recognition method includes: monitoring an input of speech input and detecting an enrollment speech and a mixed speech; acquiring speech features of a target speaker based on the enrollment speech; and determining speech belonging to the target speaker in the mixed speech based on the speech features of the target speaker. The enrollment speech includes preset speech information, and the mixed speech is non-enrollment speech inputted after the enrollment speech.

8.

发明授权
Method and apparatus for voice perception management in a multi-user environment 有权

公开(公告)号：US11978467B2

公开(公告)日：2024-05-07

申请号：US17870759

申请日：2022-07-21

申请人： Dell Products, LP

发明人： Peng Lip Goh , Deeder M. Aurongzeb , Eng Kang Chng

IPC分类号： H04R3/00 , G10L21/02 , G10L21/0216 , G10L25/84 , H04R1/08 , H04R1/32 , G10L21/0208

CPC分类号： G10L21/0216 , G10L25/84 , H04R1/08 , H04R1/323 , G10L2021/02087

摘要： A speakerphone includes a processor, a memory device, a power management unit, a first microphone to receive audio waves, a second microphone to receive audio waves, and a third microphone to receive audio waves. The speakerphone may also include a digital signal processor (DSP) to detect a single-user mode activated at the speakerphone, process the audio waves received by the first microphone, second microphone, and third microphone to determine the wave phases of the audio waves received by the first microphone, second microphone, and third microphone, calculate a direction of a voice of a single user relative to the speakerphone; and process the voice of the single user and filter other voices detected by the first microphone, second microphone, and third microphone from the user's voice.

9.

发明授权
Trained generative model speech coding 有权

公开(公告)号：US11978464B2

公开(公告)日：2024-05-07

申请号：US17757122

申请日：2021-01-22

申请人： GOOGLE LLC

发明人： Willem Bastiaan Kleijn , Andrew Storus

IPC分类号： G10L19/00 , G10L19/038 , G10L19/04 , G10L21/02 , G06N3/02

CPC分类号： G10L19/038 , G10L19/04 , G10L21/02 , G06N3/02 , G10L19/00

摘要： A method includes receiving sampled audio data corresponding to utterances and training a machine learning (ML) model, using the sampled audio data, to generate a high-fidelity audio stream from a low bitrate input bitstream. The training of the ML model includes de-emphasizing the influence of low-probability distortion events in the sampled audio data on the trained ML model, where the de-emphasizing of the distortion events is achieved by the inclusion of a term in an objective function of the ML model, which term encourages low-variance predictive distributions of a next sample in the sampled audio data, based on previous samples of the audio data.

10.

发明授权
Voice driven dynamic menus 有权

公开(公告)号：US11934636B2

公开(公告)日：2024-03-19

申请号：US18190530

申请日：2023-03-27

申请人： Snap Inc.

发明人： Jesse Chand

IPC分类号： G06F3/0482 , G06F3/04817 , G06F3/16 , G10L21/003 , G10L21/02 , G10L21/0316 , G10L25/78 , G11B27/031 , G11B27/34

CPC分类号： G06F3/0482 , G06F3/167 , G11B27/031 , G11B27/34 , G06F3/04817 , G06F3/165 , G06F2203/04803 , G10L21/003 , G10L21/02 , G10L21/0316 , G10L25/78

摘要： Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类