Frame-level combination of deep neural network and gaussian mixture models
    1.
    发明授权
    Frame-level combination of deep neural network and gaussian mixture models 有权
    深层神经网络和高斯混合模型的帧级组合

    公开(公告)号:US09240184B1

    公开(公告)日:2016-01-19

    申请号:US13765002

    申请日:2013-02-12

    CPC classification number: G10L15/22 G10L15/142

    Abstract: A method and system for frame-level merging of HMM state predictions determined by different techniques is disclosed. An audio input signal may be transformed into a first and second sequence of feature vector, the sequences corresponding to each other and to a temporal sequence of frames of the audio input signal on a frame-by-frame basis. The first sequence may be processed by a neural network (NN) to determine NN-based state predictions, and the second sequence may be processed by a Gaussian mixture model (GMM) to determine GMM-based state predictions. The NN-based and GMM-based state predictions may be merged as weighted sums for each of a plurality of HMM state on a frame-by-frame basis to determine merged state predictions. The merged state predictions may then be applied to the HMMs to speech content of the audio input signal.

    Abstract translation: 公开了通过不同技术确定的HMM状态预测的帧级合并的方法和系统。 音频输入信号可以被变换为第一和第二特征向量序列,这些序列彼此对应,并且逐帧地基于音频输入信号的帧的时间序列。 第一序列可以由神经网络(NN)处理以确定基于NN的状态预测,并且第二序列可以由高斯混合模型(GMM)来处理,以确定基于GMM的状态预测。 基于NN和GMM的状态预测可以逐帧合并为多个HMM状态中的每一个的加权和,以确定合并的状态预测。 然后可以将合并状态预测应用于HMM到音频输入信号的语音内容。

Patent Agency Ranking