SYSTEM AND METHOD FOR ENRICHING TEXT-TO-SPEECH SYNTHESIS WITH AUTOMATIC DIALOG ACT TAGS
    12.
    发明申请
    SYSTEM AND METHOD FOR ENRICHING TEXT-TO-SPEECH SYNTHESIS WITH AUTOMATIC DIALOG ACT TAGS 审中-公开
    用自动对话法则标签增强文本语音合成的系统和方法

    公开(公告)号:US20130066632A1

    公开(公告)日:2013-03-14

    申请号:US13232630

    申请日:2011-09-14

    IPC分类号: G10L13/08

    CPC分类号: G10L13/10

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for modifying the prosody of synthesized speech based on an associated speech act. A system configured according to the method embodiment (1) receives text, (2) performs an analysis of the text to determine and assign a speech act label to the text, and (3) converts the text to speech, where the speech prosody is based on the speech act label. The analysis performed compares the text to a corpus of previously tagged utterances to find a close match, determines a confidence score from a correlation of the text and the close match, and, if the confidence score is above a threshold value, retrieving the speech act label of the close match and assigning it to the text.

    摘要翻译: 本文公开了用于基于相关联的语音动作来修改合成语音的韵律的系统,方法和非暂时的计算机可读存储介质。 根据方法实施例(1)配置的系统接收文本,(2)对文本进行分析以确定并分配文本的语音标签,以及(3)将文本转换为语音,其中语音韵律是 基于言语行为标签。 进行的分析将文本与先前标记的话语的语料库进行比较以找到紧密匹配,从文本的相关性和紧密匹配之间确定置信度分数,并且如果置信度分数高于阈值,则检索语音行为 标签的紧密匹配并将其分配给文本。

    PARTITIONING OF MARKUP LANGUAGE DOCUMENTS
    13.
    发明申请
    PARTITIONING OF MARKUP LANGUAGE DOCUMENTS 有权
    标记语言文件的划分

    公开(公告)号:US20100125783A1

    公开(公告)日:2010-05-20

    申请号:US12272617

    申请日:2008-11-17

    IPC分类号: G06F17/24

    摘要: A process and system for partitioning hybrid markup language documents (HMLDs) is disclosed. Content from an HMLD is copied to one or more output markup language documents (MLDs), which may be well-formed or valid MLDs. The HMLD is segmented at partition boundaries within the document, while state information is recorded in a tag stack. The state information is used to complete the output MLD, which may be sent to a software module for processing. The HMLDs and MLDs may be well-formed or valid extensible markup language (XML) documents.

    摘要翻译: 公开了用于分割混合标记语言文档(HMLD)的过程和系统。 来自HMLD的内容被复制到一个或多个输出标记语言文档(MLD),其可以是格式正确的或有效的MLD。 HMLD在文档中的分区边界处被分段,而状态信息被记录在标签堆栈中。 状态信息用于完成可以发送到软件模块进行处理的输出MLD。 HMLD和MLD可以是格式正确的或有效的可扩展标记语言(XML)文档。

    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH
    14.
    发明申请
    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH 有权
    用于语音搜索的分辨率发音建模的系统和方法

    公开(公告)号:US20100125457A1

    公开(公告)日:2010-05-20

    申请号:US12274025

    申请日:2008-11-19

    IPC分类号: G10L15/04

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable media for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by (1) identifying word and phone alignments and corresponding likelihood scores, and (2) discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information (MMI), maximum likelihood (MLE) training, minimum classification error (MCE) training, or other functions known to those of skill in the art. Speech utterances can be names. The speech utterances can be received as part of a multimodal search or input. The step of discriminatively adapting pronunciation weights can further include stochastically modeling pronunciations.

    摘要翻译: 本文公开了用于语音识别的系统,计算机实现的方法和计算机可读介质。 该方法包括接收语音话语,在语音话语中为每个语音单元分配发音权重,将每个相应的发音权重以语音级别为单位归一化为1,对于每个接收到的语音话语,通过( 1)识别词和电话对齐和相应的可能性分数,以及(2)歧视地调整发音权重以最小化分类错误,以及使用优化的发音权重来识别附加的接收到的语音话语。 语音单位可以是句子,单词,上下文相关的电话,与上下文无关的电话或音节。 该方法还可以包括基于目标函数的歧视地适应发音权重。 目标函数可以是本领域技术人员已知的最大相互信息(MMI),最大似然(MLE)训练,最小分类误差(MCE)训练或其他功能。 言语言可以是名字。 可以作为多模态搜索或输入的一部分接收演讲话语。 歧视性地适应发音权重的步骤还可以包括随机建模发音。

    Method and system for preselection of suitable units for concatenative speech
    15.
    发明授权
    Method and system for preselection of suitable units for concatenative speech 有权
    用于连接语音的合适单位的预选方法和系统

    公开(公告)号:US07124083B2

    公开(公告)日:2006-10-17

    申请号:US10702154

    申请日:2003-11-05

    IPC分类号: G10L13/04

    CPC分类号: G10L13/07 G10L2015/022

    摘要: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.

    摘要翻译: 用于改善文本到语音合成的响应时间的系统和方法利用“三音节上下文”(即,包括中心音素及其直接上下文的三元组)作为基本单元,而不是执行音素合成。 该方法包括一种产生在语音合成中使用的三通电话预选成本数据库的方法,该方法包括:1)选择三电话序列u 1 -u 2-sub 3 ,2)计算每个5个音素序列的预选成本u 1 -u 1 -u 2 -u 其中u 2可以与数据库中的任何相同标记的音素相匹配,并且单位为a SUB>和u> b< b< b< b< b< b< b< b< b>在整个音素范围内变化,以及3)在Triphone预选成本数据库中存储呈现最低成本的所选择的三电话序列的组。

    System and method for unit selection text-to-speech using a modified Viterbi approach
    16.
    发明授权
    System and method for unit selection text-to-speech using a modified Viterbi approach 有权
    使用修改的维特比法进行单位选择文本到语音的系统和方法

    公开(公告)号:US08731931B2

    公开(公告)日:2014-05-20

    申请号:US12818835

    申请日:2010-06-18

    IPC分类号: G10L13/00 G10L13/06

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

    摘要翻译: 本文公开了用于语音合成的系统,方法和非暂时的计算机可读存储介质。 实施该方法的系统接收一组有序列表的语音单元,对于有序列表组中的每个有序列表中的每个相应的语音单元,从适合于级联的下一个有序列表构建语音单元的子列表,执行一个 基于用于每个相应语音单元的语音单元的子列表,通过语音单元的有序列表集合的路径的成本分析,并且基于成本分析,通过所述一组有序列表使用语音单元的最低成本路径来合成语音。 有序列表可以基于每个语音单元的相应音调来排序。 在一个实施例中,可以分配不具有分配音调的语音单元。

    Tabulating triphone sequences by 5-phoneme contexts for speech synthesis
    17.
    发明授权
    Tabulating triphone sequences by 5-phoneme contexts for speech synthesis 有权
    通过5个音素语境制作三音节序列用于语音合成

    公开(公告)号:US08566099B2

    公开(公告)日:2013-10-22

    申请号:US13550074

    申请日:2012-07-16

    IPC分类号: G10L13/00 G10L13/06

    CPC分类号: G10L13/07 G10L2015/022

    摘要: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes identifying a set of triphone sequences, tabulating the set of triphone sequences using a plurality of contexts, where each context specific triphone sequence of the plurality of context specific triphone sequences has a top N triphone units made of the triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination. Input texts having one of the contexts are received, and one of the context specific triphone sequences is selected based on the context. Input text is then synthesized using the context specific triphone sequence.

    摘要翻译: 一种用于改善使用三耳机上下文的文本到语音合成的响应时间的系统和方法。 该方法包括识别一组三电话序列,使用多个上下文列表三组电话序列集合,其中多个上下文特定三电话序列中的每个上下文特定三音节序列具有由具有最低目标的三电话单元制成的前N个三音单元 每个三音单元单独组合成5音素组合的费用。 接收具有上下文之一的输入文本,并且基于上下文来选择上下文特定三通电话序列之一。 然后使用上下文特定的三音节序列合成输入文本。

    System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

    公开(公告)号:US08548807B2

    公开(公告)日:2013-10-01

    申请号:US12480848

    申请日:2009-06-09

    IPC分类号: G10L15/04

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

    System and method for audibly presenting selected text
    19.
    发明授权
    System and method for audibly presenting selected text 有权
    用于可听地呈现所选文本的系统和方法

    公开(公告)号:US08239201B2

    公开(公告)日:2012-08-07

    申请号:US12257994

    申请日:2008-10-24

    IPC分类号: G10L13/08

    摘要: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.

    摘要翻译: 这里公开的是用于从计算设备上的所选文本呈现语音的方法。 该方法包括在触敏显示器上呈现文本并使该文本大小在阈值水平内,使得当用户触摸触摸屏时计算设备可以准确地确定用户的意图。 一旦接收到用户触摸,计算设备就识别和解释要被选择的文本部分,并随后向用户呈现可听见的文本。

    METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH
    20.
    发明申请
    METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH 有权
    用于组合语音适用单元的方法和系统

    公开(公告)号:US20090094035A1

    公开(公告)日:2009-04-09

    申请号:US12325809

    申请日:2008-12-01

    IPC分类号: G10L13/08

    CPC分类号: G10L13/07 G10L2015/022

    摘要: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.

    摘要翻译: 用于改善文本到语音合成的响应时间的系统和方法利用“三音节上下文”(即,包括中心音素及其直接上下文的三元组)作为基本单元,而不是执行音素合成。 该方法包括一种产生用于语音合成的三通电话预选成本数据库的方法,该方法包括:1)选择三电话序列u1-u2-u3,2)计算每个5音素序列ua-u1-u2的预选成本 -u3-ub,其中u2被允许匹配数据库中的任何相同标记的音素,并且单位ua和ub在整个音素宇宙中变化,以及3)将具有最低成本的所选择的三电话序列的组存储在三耳机预选成本中 数据库。