System and method for segmentation and recognition of speech signals

发明授权

US06278972B1 System and method for segmentation and recognition of speech signals 有权

标题翻译：用于语音信号的分割和识别的系统和方法

请登陆查看更多内容

专利标题： System and method for segmentation and recognition of speech signals
专利标题（中）： 用于语音信号的分割和识别的系统和方法
申请号： US09225891

申请日： 1999-01-04
公开(公告)号： US06278972B1

公开(公告)日： 2001-08-21
发明人: Ning Bi , Chienchung Chang
申请人： Ning Bi , Chienchung Chang
主分类号： G01L1504
IPC分类号： G01L1504

System and method for segmentation and recognition of speech signals

摘要：

A system and method for forming a segmented speech signal from an input speech signal having a plurality of frames. The input speech signal is converted from a time domain signal to a frequency domain signal having a plurality of speech frames, wherein each speech frame in the frequency domain signal is represented by at least one spectral value associated with the speech frame. A spectral difference value is then determined for each pair of adjacent frames in the frequency domain signal, wherein the spectral difference value for each pair of adjacent frames is representative of a difference between the at least one spectral value associated with each frame in the pair of adjacent frames. An initial cluster boundary is set between each pair of adjacent frames in the frequency domain signal, and a variance value is assigned to each cluster in the frequency domain signal, wherein the variance value for each cluster is equal to one of the determined spectral difference values. Next, a plurality of cluster merge parameters is calculated, wherein each of the cluster merge parameters is associated with a pair of adjacent clusters in the frequency domain signal. A minimum cluster merge parameter is selected from the plurality of cluster merge parameters. A merged cluster is then formed by canceling a cluster boundary between the clusters associated with the minimum merge parameter and assigning a merged variance value to the merged cluster, wherein the merged variance value is representative of the variance values assigned to the clusters associated with the minimum merge parameter. The process is repeated in order to form a plurality of merged clusters, and the segmented speech signal is formed in accordance with the plurality of merged clusters.

摘要（中）：

一种用于从具有多个帧的输入语音信号形成分段语音信号的系统和方法。输入语音信号从时域信号转换为具有多个语音帧的频域信号，其中频域信号中的每个语音帧由与语音帧相关联的至少一个频谱值表示。然后对频域信号中的每对相邻帧确定频谱差值，其中每对相邻帧的频谱差值表示与该对相邻帧中的每个帧相关联的至少一个频谱值之间的差异相邻帧。在频域信号中的每对相邻帧之间设置初始簇边界，并且将频域值分配给频域信号中的每个簇，其中每个簇的方差值等于所确定的光谱差值之一。接下来，计算多个集群合并参数，其中每个集群合并参数与频域信号中的一对相邻集群相关联。从多个集群合并参数中选择最小集群合并参数。然后通过消除与最小合并参数相关联的集群之间的集群边界并将合并的方差值分配给合并的集群来形成合并的集群，其中合并的方差值表示分配给与最小合并参数相关联的集群的方差值合并参数。重复该过程以形成多个合并的群集，并且根据多个合并的群集形成分段语音信号。

信息查询

Espacenet