Speech driven lip synthesis using viseme based hidden markov models

Invention Grant

US06366885B1 Speech driven lip synthesis using viseme based hidden markov models 有权

Title translation: 使用基于Viseme的隐马尔可夫模型的语音驱动唇形合成

Please log in to see more content

Patent Title: Speech driven lip synthesis using viseme based hidden markov models
Patent Title (中): 使用基于Viseme的隐马尔可夫模型的语音驱动唇形合成
Application No.: US09384763

Application Date: 1999-08-27
Publication No.: US06366885B1

Publication Date: 2002-04-02
Inventor: Sankar Basu , Tanveer Atzal Faruquie , Chalapathy V. Neti , Nitendra Rajput , Andrew William Senior , L. Venkata Subramaniam , Ashish Verma
Applicant: Sankar Basu , Tanveer Atzal Faruquie , Chalapathy V. Neti , Nitendra Rajput , Andrew William Senior , L. Venkata Subramaniam , Ashish Verma
Main IPC: G10L2106
IPC: G10L2106

Speech driven lip synthesis using viseme based hidden markov models

Abstract:

A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.

Abstract(Chinese):

基于视觉训练模型的视觉语音单元的语音驱动唇形合成方法。音频数据被分组为较少数量的视觉上不同的视角，而不是较大数量的音素。这些视差然后形成了隐马尔可夫模型（HMM）状态序列或神经网络的输出节点的基础。在训练阶段，从输入语音中提取音频和视觉特征，然后根据明显的视度序列对准音频特征，使用相应的音频特征来计算HMM状态输出概率或中性网络的输出。在合成阶段期间，声输入与最可能的viseme HMM序列（在基于HMM的模型的情况下）或网络的节点（在基于神经网络的系统的情况下）对齐，然后使用用于动画。

Information query

Espacenet