专利检索 cpc:"G10L2025/932" 第 1 页

1.

发明申请
Enhancement of Multichannel Audio 有权
标题翻译：增强多声道音频

公开(公告)号：US20120310635A1

公开(公告)日：2012-12-06

申请号：US13571344

申请日：2012-08-10

申请人： Hannes Muesch

发明人： Hannes Muesch

IPC分类号： G10L21/02

CPC分类号： G10L25/78 , G10L19/012 , G10L19/018 , G10L21/02 , G10L21/0205 , G10L21/0364 , G10L25/93 , G10L2025/932 , G10L2025/937

摘要： The invention relates to audio signal processing. More specifically, the invention relates to enhancing multichannel audio, such as television audio, by applying a gain to the audio that has been smoothed between portions of the audio. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.

摘要翻译： 本发明涉及音频信号处理。更具体地，本发明涉及通过对音频部分之间已被平滑的音频应用增益来增强诸如电视音频的多声道音频。本发明涉及用于执行这些方法的方法，装置以及存储在计算机可读介质上的软件，用于使计算机执行这些方法。

2.

发明申请
AUDIO SIGNAL PROCESSING SYSTEM AND AUDIO SIGNAL PROCESSING METHOD 有权
标题翻译：音频信号处理系统和音频信号处理方法

公开(公告)号：US20120095755A1

公开(公告)日：2012-04-19

申请号：US13330100

申请日：2011-12-19

申请人： Takeshi Otani , Taro Togawa , Masanao Suzuki , Yasuji Ota

发明人： Takeshi Otani , Taro Togawa , Masanao Suzuki , Yasuji Ota

IPC分类号： G10L21/02

CPC分类号： H04R3/00 , G10L21/0208 , G10L21/0216 , G10L25/00 , G10L25/18 , G10L2025/932 , H04R2430/03 , H04R2499/11

摘要： An audio signal processing system including a time-frequency conversion unit which converts an audio signal in time domain into frequency domain in frame units so as to calculate a frequency spectrum of the audio signal, a spectral change calculation unit which calculates an amount of change between a frequency spectrum of a first frame and a frequency spectrum of a second frame before the first frame based on the frequency spectrum of the first frame and the frequency spectrum of the second frame, and a judgment unit which judges the type of the noise which is included in the audio signal of the first frame in accordance with the amount of spectral change.

摘要翻译： 一种音频信号处理系统，包括时间 - 频率转换单元，其将时域中的音频信号以帧为单位转换成频域，以便计算音频信号的频谱;频谱变化计算单元，其计算基于第一帧的频谱和第二帧的频谱的第一帧的频谱和第一帧之前的第二帧的频谱，以及判断单元，其判断噪声的类型是根据光谱变化量包括在第一帧的音频信号中。

3.

发明授权
Voiced/unvoiced classification of speech for excitation codebook selection in celp speech decoding during frame erasures 失效
标题翻译：在帧擦除期间，在celp语音解码中激活码本选择的语音/语音分类

公开(公告)号：US5732389A

公开(公告)日：1998-03-24

申请号：US482708

申请日：1995-06-07

申请人： Peter Kroon , Yair Shoham

发明人： Peter Kroon , Yair Shoham

IPC分类号： G10L19/08 , G10L11/06 , G10L19/00 , G10L19/04 , H03M7/30 , H04B14/04 , G10L9/14

CPC分类号： G10L25/93 , G10L2025/932

摘要： A CELP speech decoder includes a first portion comprising an adaptive codebook and a second portion comprising a fixed codebook. The CS-ACELP decoder generates a speech excitation signal selectively based on output signals from said first and second portions when said decoder fails to receive reliably at least a portion of a current frame of compressed speech information. The decoder does this by classifying the speech signal to be generated as periodic (voiced) or non-periodic (unvoiced) and then generating an excitation signal based on this classification. If the speech signal is classified as periodic, the excitation signal is generated based on the output signal from the first portion and not on the output signal from the second portion. If the speech signal is classified as non-periodic, the excitation signal is generated based on the output signal from said second portion and not on the output signal from said first portion.

摘要翻译： CELP语音解码器包括包括自适应码本的第一部分和包括固定码本的第二部分。当所述解码器不可靠地接收到压缩语音信息的当前帧的至少一部分时，CS-ACELP解码器基于来自所述第一和第二部分的输出信号选择性地产生语音激励信号。解码器通过将要生成的语音信号分类为周期性（有声）或非周期性（无声），然后基于该分类产生激励信号来实现。如果语音信号被分类为周期性，则基于来自第一部分的输出信号而不是来自第二部分的输出信号产生激励信号。如果语音信号被分类为非周期性，则基于来自所述第二部分的输出信号而不是来自所述第一部分的输出信号产生激励信号。

4.

发明授权
Method and device for discriminating voiced and unvoiced sounds 失效
标题翻译：用于辨别有声和无声的声音的方法和装置

公开(公告)号：US5664052A

公开(公告)日：1997-09-02

申请号：US48034

申请日：1993-04-14

申请人： Masayuki Nishiguchi , Jun Matsumoto

发明人： Masayuki Nishiguchi , Jun Matsumoto

IPC分类号： G10L11/02 , G06F17/18 , G10L11/06 , G10L15/02 , G10L15/04 , G10L9/00

CPC分类号： G10L25/93 , G10L2025/783 , G10L2025/932

摘要： A method and a device for discriminating a voiced sound from an unvoiced sound or background noise in speech signals are disclosed. Each block or frame of input speech signals is divided into plural sub-blocks and the standard deviation, effective value or the peak value is detected in a detection unit for detecting statistical characteristics from one sub-block to another. A bias detection unit detects a bias on the time scale of the standard deviation, effective value or the peak value to decide whether the speech signals are voiced or unvoiced from one block to another.

摘要翻译： 公开了一种在语音信号中识别有声声音与无声或背景噪声的方法和装置。输入语音信号的每个块或帧被分成多个子块，并且在用于检测从一个子块到另一个子块的统计特性的检测单元中检测标准偏差，有效值或峰值。偏置检测单元检测标准偏差，有效值或峰值的时间标度上的偏差，以确定语音信号是从一个块到另一个是浊音还是清音。

5.

发明公开
NEURAL TEMPORAL BEAMFORMER FOR NOISE REDUCTION IN SINGLE-CHANNEL AUDIO SIGNALS 审中-公开

公开(公告)号：US20240257827A1

公开(公告)日：2024-08-01

申请号：US18160278

申请日：2023-01-26

申请人： Synaptics Incorporated

发明人： Saeed MOSAYYEBPOUR KASKARI

IPC分类号： G10L25/78 , G10L21/0208 , G10L25/93

CPC分类号： G10L25/78 , G10L21/0208 , G10L25/93 , G10L2025/932

摘要： This disclosure provides methods, devices, and systems for audio signal processing. The present implementations more specifically relate to multi-frame beamforming using neural network supervision. In some aspects, a speech enhancement system may include a linear filter, a deep neural network (DNN), a voice activity detector (VAD), and an IFC calculator. The DNN infers a probability of speech (pDNN) in a current frame of a single-channel audio signal based on a neural network model. The VAD determines whether speech is present or absent in the current audio frame based on the probability of speech pDNN. The IFC calculator may estimate an IFC vector based on the output of the DNN (such as the probability of speech pDNN) and the output of the VAD (such as an indication of whether speech is present in the current frame). The linear filter uses the IFC vector to suppress noise in the current audio frame.

6.

发明申请
METHODS AND SYSTEMS FOR CLASSIFYING AUDIO SEGMENTS OF AN AUDIO SIGNAL 审中-公开

公开(公告)号：US20170309297A1

公开(公告)日：2017-10-26

申请号：US15135671

申请日：2016-04-22

申请人： XEROX CORPORATION

发明人： Harish Arsikere , Arunasish Sen , Prathosh Aragulla Prasad

IPC分类号： G10L25/51 , G10L25/93 , G10L25/87 , G10L25/21 , G10L25/90 , G10L25/18

CPC分类号： G10L25/51 , G10L25/18 , G10L25/21 , G10L25/87 , G10L25/90 , G10L25/93 , G10L2025/932 , G10L2025/937

摘要： The disclosed embodiments illustrate a method for classifying one or more audio segments of an audio signal. The method includes determining one or more first features of a first audio segment of the one or more audio segments. The method further includes determining one or more second features based on the one or more first features. The method includes determining one or more third features of the first audio segment, wherein each of the one or more third features is determined based on a second feature of the one or more second features of the first audio segment and at least one second feature associated with a second audio segment. Additionally, the method includes classifying the first audio segment either in an interrogative category or a non-interrogative category based on one or more of the one or more second features and the one or more third features.

7.

发明申请
Voice Activity Detector for Audio Signals 有权
标题翻译：语音信号检测器

公开(公告)号：US20150243300A1

公开(公告)日：2015-08-27

申请号：US14701622

申请日：2015-05-01

申请人： DOLBY LABORATORIES LICENSING CORPORATION

发明人： Hannes Muesch

IPC分类号： G10L25/78 , G10L19/012

CPC分类号： G10L25/78 , G10L19/012 , G10L19/018 , G10L21/02 , G10L21/0205 , G10L21/0364 , G10L25/93 , G10L2025/932 , G10L2025/937

摘要： According to one aspect, a method for detecting voice activity is disclosed, the method including receiving a frame of an input audio signal, the input audio signal having an sample rate; dividing the frame into a plurality of subbands based on the sample rate, the plurality of subbands including at least a lowest subband and a highest subband; filtering the lowest subband with a moving average filter to reduce an energy of the lowest subband; estimating a noise level for each of the plurality of subbands; calculating a signal to noise ratio value for each of the plurality of subbands; and determining a speech activity level of the frame based on an average of the calculated signal to noise ratio values and a weighted average of an energy of each of the plurality of subbands. Other aspects include audio decoders that decode audio that was encoded using the methods described herein.

摘要翻译： 根据一个方面，公开了一种用于检测语音活动的方法，所述方法包括接收输入音频信号的帧，所述输入音频信号具有采样率; 基于所述采样率将所述帧划分成多个子带，所述多个子带至少包括最低子带和最高子带; 用移动平均滤波器对最低子带进行滤波，以减少最低子带的能量; 估计所述多个子带中的每一个的噪声电平; 计算所述多个子带中的每一个的信噪比值; 以及基于所计算的信噪比值的平均值和所述多个子带中的每一个的能量的加权平均值来确定所述帧的语音活动水平。其他方面包括解码使用本文描述的方法编码的音频的音频解码器。

8.

发明授权
Enhancement of multichannel audio 有权
标题翻译：增强多声道音频

公开(公告)号：US08271276B1

公开(公告)日：2012-09-18

申请号：US13463600

申请日：2012-05-03

申请人： Hannes Muesch

发明人： Hannes Muesch

IPC分类号： G10L19/14

CPC分类号： G10L25/78 , G10L19/012 , G10L19/018 , G10L21/02 , G10L21/0205 , G10L21/0364 , G10L25/93 , G10L2025/932 , G10L2025/937

摘要： The invention relates to audio signal processing. More specifically, the invention relates to enhancing multichannel audio, such as television audio, by applying a gain to the audio that has been smoothed between segments of the audio. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.

摘要翻译： 本发明涉及音频信号处理。更具体地，本发明涉及通过对已经在音频的片段之间被平滑的音频应用增益来增强诸如电视音频的多声道音频。本发明涉及用于执行这些方法的方法，装置以及存储在计算机可读介质上的软件，用于使计算机执行这些方法。

9.

发明授权
Method and device for discriminating voiced and unvoiced sounds 失效
标题翻译：用于辨别有声和无声的声音的方法和装置

公开(公告)号：US5809455A

公开(公告)日：1998-09-15

申请号：US753347

申请日：1996-11-25

申请人： Masayuki Nishiguchi , Jun Matsumoto

发明人： Masayuki Nishiguchi , Jun Matsumoto

IPC分类号： G10L11/02 , G06F17/18 , G10L11/06 , G10L15/02 , G10L15/04 , G10L3/02

CPC分类号： G10L25/93 , G10L2025/783 , G10L2025/932

摘要： A method and a device for discriminating a voiced sound from an unvoiced sound or background noise in speech signals are disclosed. Each block or frame of input speech signals is divided into plural sub-blocks and the standard deviation, effective value or the peak value is detected in a detection unit for detecting statistical characteristics from one sub-block to another. A bias detection unit detects a bias on the time scale of the standard deviation, effective value or the peak value to decide whether the speech signals are voiced or unvoiced from one block to another.

摘要翻译： 公开了一种在语音信号中识别有声声音与无声或背景噪声的方法和装置。输入语音信号的每个块或帧被分成多个子块，并且在用于检测从一个子块到另一个子块的统计特性的检测单元中检测标准偏差，有效值或峰值。偏置检测单元检测标准偏差，有效值或峰值的时间标度上的偏差，以确定语音信号是从一个块到另一个是浊音还是清音。

10.

发明公开
Joint Segmenting and Automatic Speech Recognition 审中-公开

公开(公告)号：US20230343332A1

公开(公告)日：2023-10-26

申请号：US18304064

申请日：2023-04-20

申请人： Google LLC

发明人： Ronny Huang , Shuo-yiin Chang , David Rybach , Rohit Prakash Prabhavalkar , Tara N. Sainath , Cyril Allauzen , Charles Caleb Peyser , Zhiyun Lu

IPC分类号： G10L15/04 , G10L25/93 , G10L15/197 , G10L15/06 , G10L15/22 , G10L15/02

CPC分类号： G10L15/197 , G10L15/02 , G10L15/04 , G10L15/063 , G10L15/22 , G10L25/93 , G10L2015/025 , G10L2025/932

摘要： A joint segmenting and ASR model includes an encoder and decoder. The encoder configured to: receive a sequence of acoustic frames characterizing one or more utterances; and generate, at each output step, a higher order feature representation for a corresponding acoustic frame. The decoder configured to: receive the higher order feature representation and generate, at each output step: a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of speech segment. The j oint segmenting and ASR model trained on a set of training samples, each training sample including: audio data characterizing a spoken utterance; and a corresponding transcription of the spoken utterance, the corresponding transcription having an end of speech segment ground truth token inserted into the corresponding transcription automatically based on a set of heuristic-based rules and exceptions applied to the training sample.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类