Duration prediction modeling in speech synthesis
    1.
    发明授权
    Duration prediction modeling in speech synthesis 有权
    语音合成中的持续时间预测建模

    公开(公告)号:US07840408B2

    公开(公告)日:2010-11-23

    申请号:US11551025

    申请日:2006-10-19

    Applicant: Lifu Yi Jie Hao

    Inventor: Lifu Yi Jie Hao

    CPC classification number: G10L13/10 G10L15/148

    Abstract: The present invention provides a method and apparatus for training a duration prediction model, method and apparatus for duration prediction, method and apparatus for speech synthesis. Said method for training a duration prediction model, comprising: generating an initial duration prediction model with a plurality of attributes related to duration prediction and at least part of possible attribute combinations of said plurality of attributes, in which each of said plurality of attributes and said attribute combinations is included as an item; calculating importance of each said item in said duration prediction model; deleting the item having the lowest importance calculated; re-generating a duration prediction model with the remaining items; determining whether said re-generated duration prediction model is an optimal model; and repeating said step of calculating importance and the following steps, if said duration prediction model is determined as not optimal model.

    Abstract translation: 本发明提供一种用于训练持续时间预测模型的持续时间预测模型的方法和装置,用于语音合成的方法和装置。 用于训练持续时间预测模型的所述方法,包括:生成具有与持续时间预测相关的多个属性的初始持续时间预测模型和所述多个属性的可能的属性组合的至少一部分,其中所述多个属性和所述 属性组合作为项目包含; 计算所述持续时间预测模型中每个所述项目的重要性; 删除计算重要性最低的项目; 重新生成剩余项目的持续时间预测模型; 确定所述重新生成的持续时间预测模型是否是最佳模型; 并重复所述计算重要性的步骤和以下步骤,如果所述持续时间预测模型被确定为不是最佳模型。

    METHOD AND APPARATUS PERTAINING TO THE PROCESSING OF SAMPLED AUDIO CONTENT USING A MULTI-RESOLUTION SPEECH RECOGNITION SEARCH PROCESS
    2.
    发明申请
    METHOD AND APPARATUS PERTAINING TO THE PROCESSING OF SAMPLED AUDIO CONTENT USING A MULTI-RESOLUTION SPEECH RECOGNITION SEARCH PROCESS 审中-公开
    使用多分辨率语音识别搜索过程处理采样音频内容的方法和设备

    公开(公告)号:US20080162129A1

    公开(公告)日:2008-07-03

    申请号:US11617908

    申请日:2006-12-29

    Applicant: Yan Ming Cheng

    Inventor: Yan Ming Cheng

    CPC classification number: G10L15/148

    Abstract: One provides (101) a plurality of frames of sampled audio content and then processes (102) that plurality of frames using a speech recognition search process that comprises, at least in part, searching for at least two of state boundaries, subword boundaries, and word boundaries using different search resolutions.

    Abstract translation: 一个提供(101)多个采样音频内容的帧,然后使用语音识别搜索处理(102)处理(102)多个帧,该处理至少部分地包括搜索状态边界,子词边界和 字边界使用不同的搜索分辨率。

    Acoustic model generating method for speech recognition
    3.
    发明授权
    Acoustic model generating method for speech recognition 失效
    用于语音识别的声学模型生成方法

    公开(公告)号:US5799277A

    公开(公告)日:1998-08-25

    申请号:US547794

    申请日:1995-10-25

    Applicant: Junichi Takami

    Inventor: Junichi Takami

    CPC classification number: G10L15/063 G10L15/144 G10L15/148 G10L2015/0635

    Abstract: The acoustic model generating method for speech recognition enables a high representation effect on the basis of the minimum possible model parameters. In an initial model having a smaller number of signal sources, the acoustic model for speech recognition is generated by selecting the splitting processing or the merging processing for the signal sources successively and repeatedly. The merging processing is executed prior to the splitting processing. In the merging processing, when the merged result is not appropriate, the splitting processing is executed for the model obtained before merging processing (without use of the merged result). Further, the splitting processing includes three methods at the same time, as (1) a method of splitting the signal source into two and reconstructing a shared structure between a plurality of states having common signal sources to be split, (2) a method of splitting one state into two states corresponding to different phoneme context categories in phoneme context direction, (3) a method of splitting one state into two states corresponding to different speech sections in time direction. One of the three methods is selected by obtaining three pieces of maximum likelihood for the three splitting steps and judging which one is the biggest to select the splitting step for which the biggest likelihood is obtained.

    Abstract translation: 用于语音识别的声学模型生成方法能够在最小可能的模型参数的基础上实现高的表示效果。 在具有较少数量信号源的初始模型中,通过连续且重复地选择信号源的分离处理或合并处理来产生用于语音识别的声学模型。 在分割处理之前执行合并处理。 在合并处理中,合并结果不合适时,对合并处理前获得的模型执行分割处理(不使用合并结果)。 此外,分割处理同时包括三种方法:(1)将信号源分为两部分并重构具有共同信号源的多个状态之间的共享结构的方法,(2)一种方法, 将一个状态分解为与音素上下文方向上的不同音素上下文类别对应的两个状态,(3)将一个状态分为与时间方向上的不同语音部分对应的两个状态的方法。 通过获得三个分割步骤的最大似然度的三个部分来选择三种方法中的一种,并且判断哪一种是最大的,以选择获得最大可能性的分离步骤。

    METHOD AND APPARATUS FOR TRAINING A DURATION PREDICTION MODEL, METHOD AND APPARATUS FOR DURATION PREDICTION, METHOD AND APPARATUS FOR SPEECH SYNTHESIS
    4.
    发明申请
    METHOD AND APPARATUS FOR TRAINING A DURATION PREDICTION MODEL, METHOD AND APPARATUS FOR DURATION PREDICTION, METHOD AND APPARATUS FOR SPEECH SYNTHESIS 有权
    用于训练持续时间预测模型的方法和装置,用于持续时间预测的方法和装置,用于语音合成的方法和装置

    公开(公告)号:US20070129948A1

    公开(公告)日:2007-06-07

    申请号:US11551025

    申请日:2006-10-19

    Applicant: Lifu Yi Jie Hao

    Inventor: Lifu Yi Jie Hao

    CPC classification number: G10L13/10 G10L15/148

    Abstract: The present invention provides a method and apparatus for training a duration prediction model, method and apparatus for duration prediction, method and apparatus for speech synthesis. Said method for training a duration prediction model, comprising: generating an initial duration prediction model with a plurality of attributes related to duration prediction and at least part of possible attribute combinations of said plurality of attributes, in which each of said plurality of attributes and said attribute combinations is included as an item; calculating importance of each said item in said duration prediction model; deleting the item having the lowest importance calculated; re-generating a duration prediction model with the remaining items; determining whether said re-generated duration prediction model is an optimal model; and repeating said step of calculating importance and the following steps, if said duration prediction model is determined as not optimal model.

    Abstract translation: 本发明提供一种用于训练持续时间预测模型的持续时间预测模型的方法和装置,用于语音合成的方法和装置。 用于训练持续时间预测模型的所述方法,包括:生成具有与持续时间预测相关的多个属性的初始持续时间预测模型和所述多个属性的可能的属性组合的至少一部分,其中所述多个属性和所述 属性组合作为项目包含; 计算所述持续时间预测模型中每个所述项目的重要性; 删除计算重要性最低的项目; 重新生成剩余项目的持续时间预测模型; 确定所述重新生成的持续时间预测模型是否是最佳模型; 并重复所述计算重要性的步骤和以下步骤,如果所述持续时间预测模型被确定为不是最佳模型。

    Method of enrolling phone-based speaker specific commands
    5.
    发明授权
    Method of enrolling phone-based speaker specific commands 有权
    注册基于电话的演讲者特定命令的方法

    公开(公告)号:US06377924B1

    公开(公告)日:2002-04-23

    申请号:US09501884

    申请日:2000-02-10

    CPC classification number: G10L15/07 G10L15/148 G10L2015/025

    Abstract: A method of enrolling phone-based speaker specific commands includes the first step of providing a set of (H) of speaker-independent phone-based Hidden Markov Models (HMMs), grammar (G) comprising a loop of phones with optional between word silence (BWS) and two utterances U1, and U2 of the command produced by the enrollment speaker and wherein the first frames of the first utterance contain only background noise. The processor generates a sequence of phone-like HMMs and the number of HMMs in that sequence as output. The second step performs model mean adjustment to suit enrollment microphone and speaker characteristics and performs segmentation. The third step generates an HMM for each segment except for silence for utterance U1. The fourth step re-estimates the HMM using both utterance U1 and U2.

    Abstract translation: 一种登记基于电话的说话者特定命令的方法包括提供一组(H)与扬声器无关的基于电话的隐马尔可夫模型(HMM),语法(G))的第一步骤,其包括可以在字静音 (BWS)和由登记说话者产生的命令的两个发声U1和U2,并且其中第一个发音的第一帧仅包含背景噪声。 处理器产生一系列类似手机的HMM,并且该序列中的HMM数量作为输出。 第二步执行模型平均调整以适应入选麦克风和扬声器特性,并进行分割。 第三步为每个段生成HMM,除了用于发声U1的静音。 第四步使用发音U1和U2重新估计HMM。

    Method of selectively assigning a penalty to a probability associated with a voice recognition system
    6.
    发明授权
    Method of selectively assigning a penalty to a probability associated with a voice recognition system 有权
    对与语音识别系统相关联的概率选择性地分配惩罚的方法

    公开(公告)号:US06233557B1

    公开(公告)日:2001-05-15

    申请号:US09256031

    申请日:1999-02-23

    CPC classification number: G10L15/148

    Abstract: A voice recognition system (204, 206, 207, 208) assigns a penalty to a score in a voice recognition system. The system generates a lower threshold for the number of frames assigned to at least one state of at least one model and an upper threshold for the number of frames assigned to at least one state of at least one model. The system assigns an out of state transition penalty to an out of state transition score in an allocation assignment algorithm if the lower threshold has not been met. The out of state transition penalty is proportional to the number of frames that the dwell time is below the lower threshold. A self loop penalty is applied to a self loop score if the upper threshold number of frames assigned to a state has been exceeded. The out of state transition penalty is proportional to the number of frames that the dwell time is above the upper threshold.

    Abstract translation: 语音识别系统(204,206,207,208)为语音识别系统中的分数分配惩罚。 该系统生成分配给至少一个模型的至少一个状态的帧数的下限阈值和分配给至少一个模型的至少一个状态的帧数的上限阈值。 如果没有满足较低的阈值,则系统将分配分配算法中的状态外转移罚分分配给状态转移分数。 状态过渡损失与停留时间低于下限阈值的帧数成比例。 如果已经超过分配给状态的帧的上限阈值,则自环惩罚应用于自回合分数。 状态过渡罚分与驻留时间高于上限阈值的帧数成比例。

    Method and apparatus for a parameter sharing speech recognition system
    7.
    发明授权
    Method and apparatus for a parameter sharing speech recognition system 失效
    一种参数共享语音识别系统的方法和装置

    公开(公告)号:US6006186A

    公开(公告)日:1999-12-21

    申请号:US953026

    申请日:1997-10-16

    CPC classification number: G10L15/142 G10L15/148

    Abstract: A method and an apparatus for a parameter sharing speech recognition system are provided. Speech signals are received into a processor of a speech recognition system. The speech signals are processed using a speech recognition system hosting a shared hidden Markov model (HMM) produced by generating a number of phoneme models, some of which are shared. The phoneme models are generated by retaining as a separate phoneme model any triphone model having a number of trained frames available that exceeds a prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having a common biphone exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models having the same center context. The generated phoneme models are trained, and shared phoneme model states are generated that are shared among the phoneme models. Shared probability distribution functions are generated that are shared among the phoneme model states. Shared probability sub-distribution functions are generated that are shared among the phoneme model probability distribution functions. The shared phoneme model hierarchy is reevaluated for further sharing in response to the shared probability sub-distribution functions. Signals representative of the received speech signals are generated.

    Abstract translation: 提供了一种用于参数共享语音识别系统的方法和装置。 语音信号被接收到语音识别系统的处理器中。 语音信号使用一个语音识别系统进行处理,该语音识别系统承载通过生成许多音素模型而产生的共享隐马尔可夫模型(HMM),其中一些是共享的。 音素模型是通过保留作为单独音素模型的任何具有超过预定阈值的已训练帧数的三音模型而产生的。 生成共享音素模型以表示具有共同biphone的经过训练的帧的数量超过预定阈值的三音节音素模型组中的每一组。 生成共享音素模型以表示三音节音素模型中的每一组,其中对音素上下文具有等效影响的经过训练的帧的数量超过预先指定的阈值。 生成共享音素模型以表示具有相同中心上下文的三音节音素模型组中的每一组。 生成的音素模型被训练,并且生成在音素模型中共享的共享音素模型状态。 生成在音素模型状态之间共享的共享概率分布函数。 生成在音素模型概率分布函数中共享的共享概率子分布函数。 共享音素模型层次结构被重新评估以响应于共享概率子分布函数进一步共享。 生成表示接收到的语音信号的信号。

    SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, SPEECH SYNTHESIS MODEL TRAINING DEVICE, SPEECH SYNTHESIS MODEL TRAINING METHOD, AND COMPUTER PROGRAM PRODUCT

    公开(公告)号:US20180174570A1

    公开(公告)日:2018-06-21

    申请号:US15896774

    申请日:2018-02-14

    CPC classification number: G10L13/0335 G10L13/10 G10L15/148

    Abstract: A speech synthesis device of an embodiment includes a memory unit, a creating unit, a deciding unit, a generating unit and a waveform generating unit. The memory unit stores, as statistical model information of a statistical model, an output distribution of acoustic feature parameters including pitch feature parameters and a duration distribution. The creating unit creates a statistical model sequence from context information and the statistical model information. The deciding unit decides a pitch-cycle waveform count of each state using a duration based on the duration distribution of each state of each statistical model in the statistical model sequence, and pitch information based on the output distribution of the pitch feature parameters. The generating unit generates an output distribution sequence based on the pitch-cycle waveform count, and acoustic feature parameters based on the output distribution sequence. The waveform generating unit generates a speech waveform from the generated acoustic feature parameters.

    Duration ratio modeling for improved speech recognition
    9.
    发明授权
    Duration ratio modeling for improved speech recognition 有权
    持续时间比建模改进语音识别

    公开(公告)号:US09542939B1

    公开(公告)日:2017-01-10

    申请号:US13600851

    申请日:2012-08-31

    CPC classification number: G10L15/02 G10L15/148 G10L2015/025

    Abstract: In speech recognition, the duration of a phoneme is taken into account when determining recognition scores. Specifically, the duration of a phoneme may be evaluated relative to the duration of neighboring phonemes. A phoneme that is interpreted to be significantly longer or shorter than its neighbors may be given a lower duration score. A duration score for a phoneme may be calculated and used to adjust a recognition score. In this manner a duration model may supplement an acoustic model and language model to improve speech recognition results.

    Abstract translation: 在语音识别中,在确定识别分数时考虑音素的持续时间。 具体地,可以相对于相邻音素的持续时间来评估音素的持续时间。 被解释为比其邻居明显更长或更短的音素可以被给予较低的持续时间分数。 可以计算音素的持续时间分数并用于调整识别分数。 以这种方式,持续时间模型可以补充声学模型和语言模型以改善语音识别结果。

    Acoustic model creating method, acoustic model creating apparatus, acoustic model creating program, and speech recognition apparatus
    10.
    发明申请
    Acoustic model creating method, acoustic model creating apparatus, acoustic model creating program, and speech recognition apparatus 审中-公开
    声学模型创建方法,声学模型创建装置,声学模型创建程序和语音识别装置

    公开(公告)号:US20050154589A1

    公开(公告)日:2005-07-14

    申请号:US10990626

    申请日:2004-11-18

    CPC classification number: G10L15/144 G10L15/148 G10L2015/027

    Abstract: Exemplary embodiments of the present invention enhance the recognition ability by optimizing state numbers of respective HMM's. Exemplary embodiments provide a description length computing unit to find description lengths of respective syllable HMM's for which the number of states forming syllable HMM's is set to plural kinds of state numbers from a given value to the maximum state number, using the Minimum Description Length criterion, for each of syllable HMM's set to their respective state numbers. An HMM selecting unit selects an HMM having the state number with which the description length found by the description length computing device is a minimum. An HMM re-training unit re-trains the syllable HMM selected by the syllable HMM selecting unit with the use of training speech data.

    Abstract translation: 本发明的示例性实施例通过优化相应HMM的状态数来增强识别能力。 示例性实施例提供了一种描述长度计算单元,用于使用最小描述长度标准来查找形成音节HMM的状态的数量从给定值到最大状态数的多种状态数的各个音节HMM的描述长度, 为每个音节HMM设置为各自的州号。 HMM选择单元选择具有由描述长度计算设备找到的描述长度最小的状态号的HMM。 HMM重新训练单元使用训练语音数据重新训练由音节HMM选择单元选择的音节HMM。

Patent Agency Ranking