SPEECH RECOGNITION APPARATUS BASED ON CEPSTRUM FEATURE VECTOR AND METHOD THEREOF
    1.
    发明申请
    SPEECH RECOGNITION APPARATUS BASED ON CEPSTRUM FEATURE VECTOR AND METHOD THEREOF 审中-公开
    基于CEPSTR特征向量的语音识别装置及其方法

    公开(公告)号:US20130138437A1

    公开(公告)日:2013-05-30

    申请号:US13558236

    申请日:2012-07-25

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20

    摘要: A speech recognition apparatus, includes a reliability estimating unit configured to estimate reliability of a time-frequency segment from an input voice signal; and a reliability reflecting unit configured to reflect the reliability of the time-frequency segment to a normalized cepstrum feature vector extracted from the input speech signal and a cepstrum average vector included for each state of an HMM in decoding. Further, the speech recognition apparatus includes a cepstrum transforming unit configured to transform the cepstrum feature vector and the average vector through a discrete cosine transformation matrix and calculate a transformed cepstrum vector. Furthermore, the speech recognition apparatus includes an output probability calculating unit configured to calculate an output probability value of time-frequency segments of the input speech signal by applying the transformed cepstrum vector to the cepstrum feature vector and the average vector.

    摘要翻译: 一种语音识别装置,包括:可靠性估计单元,被配置为从输入语音信号估计时间频率段的可靠性; 以及可靠性反射单元,其被配置为将所述时间频率段的可靠性反映到从所述输入语音信号提取的归一化反相特征向量和在解码中针对HMM的每个状态包括的倒谱平均向量。 此外,语音识别装置包括配置成通过离散余弦变换矩阵变换倒频谱特征矢量和平均矢量的倒频变换单元,并计算变换倒谱矢量。 此外,语音识别装置包括:输出概率计算单元,被配置为通过将变换倒谱矢量应用于倒谱特征向量和平均矢量来计算输入语音信号的时间段的输出概率值。

    Apparatus and method for recognizing content using audio signal
    5.
    发明授权
    Apparatus and method for recognizing content using audio signal 有权
    使用音频信号识别内容的装置和方法

    公开(公告)号:US08886635B2

    公开(公告)日:2014-11-11

    申请号:US13639834

    申请日:2012-06-08

    IPC分类号: G06F17/30 G10L25/54

    摘要: The present invention relates to an apparatus and method for recognizing content using an audio signal. The content recognition apparatus includes a query fingerprint extraction unit for forming frames having a preset frame length for an audio signal, and generating frame-based feature vectors for respective frames, thus extracting a query fingerprint. A reference fingerprint DB stores reference fingerprints to be compared with the query fingerprint and pieces of content information corresponding to the reference fingerprints. A fingerprint matching unit determines a reference fingerprint matching the query fingerprint. In this case, the query fingerprint extraction unit forms the frames while varying a frame shift size that is an interval between start points of neighboring frames in a partial section. According to the present invention, there can be provided a content recognition apparatus and method which can maintain the accuracy and reliability of matching while promptly providing results.

    摘要翻译: 本发明涉及一种使用音频信号识别内容的装置和方法。 内容识别装置包括:查询指纹提取单元,用于形成具有用于音频信号的预设帧长度的帧,并且生成针对各个帧的基于帧的特征向量,从而提取查询指纹。 参考指纹DB存储要与查询指纹进行比较的参考指纹和对应于参考指纹的内容信息。 指纹匹配单元确定与查询指纹匹配的参考指纹。 在这种情况下,查询指纹提取单元形成帧,同时改变作为部分区间中的相邻帧的起始点之间的间隔的帧移位大小。 根据本发明,可以提供一种内容识别装置和方法,其能够在及时提供结果的同时保持匹配的准确性和可靠性。

    Apparatus and Method for Recognizing Content Using Audio Signal
    7.
    发明申请
    Apparatus and Method for Recognizing Content Using Audio Signal 有权
    使用音频信号识别内容的装置和方法

    公开(公告)号:US20130318071A1

    公开(公告)日:2013-11-28

    申请号:US13639834

    申请日:2012-06-08

    IPC分类号: G06F17/30

    摘要: The present invention relates to an apparatus and method for recognizing content using an audio signal. The content recognition apparatus includes a query fingerprint extraction unit for forming frames having a preset frame length for an audio signal, and generating frame-based feature vectors for respective frames, thus extracting a query fingerprint. A reference fingerprint DB stores reference fingerprints to be compared with the query fingerprint and pieces of content information corresponding to the reference fingerprints. A fingerprint matching unit determines a reference fingerprint matching the query fingerprint. In this case, the query fingerprint extraction unit forms the frames while varying a frame shift size that is an interval between start points of neighboring frames in a partial section. According to the present invention, there can be provided a content recognition apparatus and method which can maintain the accuracy and reliability of matching while promptly providing results.

    摘要翻译: 本发明涉及一种使用音频信号识别内容的装置和方法。 内容识别装置包括:查询指纹提取单元,用于形成具有用于音频信号的预设帧长度的帧,并且生成针对各个帧的基于帧的特征向量,从而提取查询指纹。 参考指纹DB存储要与查询指纹进行比较的参考指纹和对应于参考指纹的内容信息。 指纹匹配单元确定与查询指纹匹配的参考指纹。 在这种情况下,查询指纹提取单元形成帧,同时改变作为部分区间中的相邻帧的起始点之间的间隔的帧移位大小。 根据本发明,可以提供一种内容识别装置和方法,其能够在及时提供结果的同时保持匹配的准确性和可靠性。

    METHOD AND SYSTEM FOR GENERATING SEARCH NETWORK FOR VOICE RECOGNITION
    8.
    发明申请
    METHOD AND SYSTEM FOR GENERATING SEARCH NETWORK FOR VOICE RECOGNITION 审中-公开
    用于生成语音识别的搜索网络的方法和系统

    公开(公告)号:US20130138441A1

    公开(公告)日:2013-05-30

    申请号:US13585475

    申请日:2012-08-14

    IPC分类号: G10L15/04

    CPC分类号: G10L15/083 G10L15/187

    摘要: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.

    摘要翻译: 本发明公开了一种生成用于语音识别的搜索网络的方法,该方法包括:通过实现表示作为加权有限状态换能器的识别单元之间的语音转换现象的语音转换规则,生成语音转导加权有限状态换能器; 并组合发音转导加权有限状态换能器和一个或多个加权有限状态换能器。

    APPARATUS AND METHOD FOR CREATING ACOUSTIC MODEL
    9.
    发明申请
    APPARATUS AND METHOD FOR CREATING ACOUSTIC MODEL 审中-公开
    用于创建声学模型的装置和方法

    公开(公告)号:US20120109650A1

    公开(公告)日:2012-05-03

    申请号:US13284095

    申请日:2011-10-28

    IPC分类号: G10L15/14

    CPC分类号: G10L15/144 G10L15/285

    摘要: Disclosed herein is an apparatus and method for creating an acoustic model. The apparatus includes a binary tree creation unit, an information creation unit, and a binary tree reduction unit. The binary tree creation unit creates a binary tree by repeatedly merging a plurality of Gaussian components for each Hidden Markov Model (HMM) state of an acoustic model based on a distance measure reflecting a variation in likelihood score. The information creation unit creates information about information about the largest size of the acoustic model in accordance with a platform including a speech recognizer. The binary tree reduction unit reduces the binary tree in accordance with the information about the largest size of the acoustic model.

    摘要翻译: 本文公开了一种用于创建声学模型的装置和方法。 该装置包括二叉树创建单元,信息创建单元和二进制树缩小单元。 二叉树创建单元通过基于反映可能性得分的变化的距离度量反复地合并声学模型的每个隐马尔可夫模型(HMM)状态的多个高斯分量来创建二叉树。 信息创建单元根据包括语音识别器的平台创建关于声学模型的最大尺寸的信息。 二叉树缩小单元根据关于声学模型的最大尺寸的信息减少二叉树。