SYSTEM AND METHOD FOR SPEECH PERSONALIZATION BY NEED
    1.
    发明申请
    SYSTEM AND METHOD FOR SPEECH PERSONALIZATION BY NEED 有权
    需要个性化的系统和方法

    公开(公告)号:US20100312556A1

    公开(公告)日:2010-12-09

    申请号:US12480864

    申请日:2009-06-09

    CPC classification number: G10L15/07 G10L15/10 G10L15/265

    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions. The method can further store a speaker personalization profile having information for the modified set of allocated resources and recognize speech associated with the speaker based on the speaker personalization profile.

    Abstract translation: 这里公开了用于说话人识别个性化的系统,计算机实现的方法和有形的计算机可读存储介质。 该方法使用一组分配的资源来识别从与语音接口交互的扬声器接收的语音,所分配的资源的集合包括带宽,处理器时间,存储器和存储。 该方法记录与识别的语音相关联的度量,并且在记录度量之后,修改与记录的度量相称的所分配资源集合中的所分配的资源中的至少一个。 该方法使用经修改的分配资源集来识别来自扬声器的附加语音。 指标可以包括语音识别置信度分数,处理速度,对话行为,重复请求,对确认的否定响应以及任务完成。 该方法还可以存储具有用于所修改的分配资源集合的信息的扬声器个性化简档,并且基于说话者个性化简档识别与说话者相关联的语音。

    SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING
    2.
    发明申请
    SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING 有权
    通过声学模型重建来适应自动语音识别发音的系统和方法

    公开(公告)号:US20100312560A1

    公开(公告)日:2010-12-09

    申请号:US12480848

    申请日:2009-06-09

    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

    Abstract translation: 这里公开的是系统,计算机实现的方法和用于通过声学模型重构来适应自动语音识别发音来识别语音的计算机可读存储介质。 该方法识别在目标方言中典型的本地语音训练的声学模型和匹配的发音字典。 该方法从新的演讲者收集演讲,从而收集到的演讲并转录收集的演讲,以产生一个合理的音素格子。 然后,该方法创建一个自定义语音模型,用于通过用于所有似乎合理的音素的声学模型的加权和来表示在发音字典中使用的每个音素,其中发音字典不改变,而是在每个音素的声学空间的模型中 字典成为典型本地语音的音素的声学模型的加权和。 最后,该方法包括使用定制语音模型通过处理器从目标说话者识别附加语音。

    SYSTEM AND METHOD FOR GENERALIZED PRESELECTION FOR UNIT SELECTION SYNTHESIS
    3.
    发明申请
    SYSTEM AND METHOD FOR GENERALIZED PRESELECTION FOR UNIT SELECTION SYNTHESIS 有权
    用于单位选择合成的一般化选择的系统和方法

    公开(公告)号:US20110071836A1

    公开(公告)日:2011-03-24

    申请号:US12563654

    申请日:2009-09-21

    CPC classification number: G10L13/06 G10L13/00 G10L13/047

    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.

    Abstract translation: 本文公开了用于单元选择合成的系统,计算机实现的方法和计算机可读存储介质。 该方法使得计算设备将辅助电话机添加到具有现有电话机的语音合成器前端,基于补充电话机修改单元预选过程,基于修改的单位预选过程从辅助电话机和现有电话机中预选单元 ,并根据预选单位产生语音。 补充手机可以是现有手机的变体,可以包括字边界特征,可以包括其中初始辅音簇和一些字边界用变音符标记的群集特征,可以包括将单位标记为源自于 功能词或内容词,和/或可以包括语音前或后声部特征。 语音合成器前端可以将补充的电话机作为额外的功能。

    SYSTEM AND METHOD FOR SYNTHETIC VOICE GENERATION AND MODIFICATION
    4.
    发明申请
    SYSTEM AND METHOD FOR SYNTHETIC VOICE GENERATION AND MODIFICATION 有权
    用于合成语音生成和修改的系统和方法

    公开(公告)号:US20120035933A1

    公开(公告)日:2012-02-09

    申请号:US12852164

    申请日:2010-08-06

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

    Abstract translation: 这里公开了用于产生合成语音的系统,方法和非暂时的计算机可读存储介质。 被配置为实施该方法的系统组合第一文本到语音语音的第一数据库和第二文本到语音语音的第二数据库以生成组合数据库,基于策略从组合数据库中进行选择, 用于合成语音的语音类别的语音单元以产生所选择的语音单元,并且基于所选择的语音单元来合成语音。 该系统可以合成语音,而无需参数化第一个文本到语音的语音和第二个文本到语音的语音。 对于特定语音类别,策略可以定义哪些文本到语音语音来选择语音单元。 组合的数据库可以包括来自不同扬声器的多个文本到语音的声音。 组合的数据库可以包括以不同风格说话的单个扬声器的声音。 组合的数据库可以包括不同语言的语音。

    SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS
    5.
    发明申请
    SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS 有权
    用于自动检测单位选择合成中的异常应力模式的系统和方法

    公开(公告)号:US20120035917A1

    公开(公告)日:2012-02-09

    申请号:US12852146

    申请日:2010-08-06

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.

    Abstract translation: 这里公开了用于在单位选择语音合成中检测和校正异常应力模式的系统,方法和非暂时的计算机可读存储介质。 实施该方法的系统检测表示要合成的语音的所选声学单元中的不正确应力模式,并且校正所选声学单元中的不正确应力模式以产生校正的应力模式。 该系统可以基于校正的应力模式进一步合成语音。 在一个方面,系统还使用诸如分类和回归树,自适应增强,支持向量机和最大熵的机器学习算法对不正确的应力模式进行分类。 以这种方式,文本到语音单元选择语音合成器可以产生具有合适的应力模式的更自然的声音语音,而不管单元选择数据库中的单元的应力。

    SYSTEM AND METHOD FOR PERSONALIZATION OF ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION
    6.
    发明申请
    SYSTEM AND METHOD FOR PERSONALIZATION OF ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION 有权
    用于自动语音识别的声学模型的个性化系统和方法

    公开(公告)号:US20110066433A1

    公开(公告)日:2011-03-17

    申请号:US12561005

    申请日:2009-09-16

    Abstract: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

    Abstract translation: 这里公开了用于自动语音识别的方法,系统和计算机可读存储介质。 该方法包括选择一个说话者独立模型,并选择一个说话者依赖模型的数量,说话人依赖模型的数量是基于可用的计算资源,所选择的模型包括与说话者无关的模型和说话者依赖模型的数量。 该方法还包括使用所选择的模型中的每一个并行地识别话语,并且基于使用所选择的模型的组的识别精度从所选择的模型中选择主要语言模型。 该系统包括处理器和被配置为控制处理器执行该方法的模块。 计算机可读存储介质包括用于使计算设备执行该方法的步骤的指令。

    SYSTEM AND METHOD FOR PERFORMING SPEECH SYNTHESIS WITH A CACHE OF PHONEME SEQUENCES
    7.
    发明申请
    SYSTEM AND METHOD FOR PERFORMING SPEECH SYNTHESIS WITH A CACHE OF PHONEME SEQUENCES 有权
    用音频序列缓存执行语音合成的系统和方法

    公开(公告)号:US20090043585A1

    公开(公告)日:2009-02-12

    申请号:US11836423

    申请日:2007-08-09

    CPC classification number: G10L13/08 G10L13/04

    Abstract: Disclosed are systems, methods, and computer readable media for performing speech synthesis. The method embodiment comprises applying a first part of a speech synthesizer to a text corpus to obtain a plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences, for each of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize each of the plurality of respective phoneme sequences, and adding the identified joins to a cache for use in speech synthesis.

    Abstract translation: 公开了用于执行语音合成的系统,方法和计算机可读介质。 方法实施例包括将语音合成器的第一部分应用于文本语料库以获得多个音素序列,语音合成器的第一部分仅为所获得的多个音素序列中的每一个识别可能的音素序列,识别连接 将被计算以合成多个相应音素序列中的每一个,并将所识别的连接添加到用于语音合成的高速缓存中。

    SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS
    8.
    发明申请
    SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS 有权
    用于低延迟的基于WEB的文本到语音的系统和方法,没有插入

    公开(公告)号:US20130144624A1

    公开(公告)日:2013-06-06

    申请号:US13308860

    申请日:2011-12-01

    CPC classification number: G10L13/04 G10L13/10

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.

    Abstract translation: 这里公开的是系统,方法和非暂时的计算机可读存储介质,用于在不使用插件或Flash®模块的情况下减少网页浏览TTS系统中的延迟。 根据所公开的方法配置的系统允许浏览器向web服务器发送具有韵律意义的文本段。 然后,TTS服务器将文本的语调短语转换为音频,并用音频文件对浏览器进行响应。 系统将音频文件保存在缓存中,文件由唯一标识符进行索引。 随着系统继续将文本转换为语音,当出现相同的文本时,系统使用对应于相同文本的缓存音频,而不需要经由TTS服务器重新合成。

    SYSTEM AND METHOD FOR UNIT SELECTION TEXT-TO-SPEECH USING A MODIFIED VITERBI APPROACH
    9.
    发明申请
    SYSTEM AND METHOD FOR UNIT SELECTION TEXT-TO-SPEECH USING A MODIFIED VITERBI APPROACH 有权
    使用修改的VITERBI方法的单元选择文本到语音的系统和方法

    公开(公告)号:US20110313772A1

    公开(公告)日:2011-12-22

    申请号:US12818835

    申请日:2010-06-18

    CPC classification number: G10L13/02 G10L13/04 G10L13/06 G10L13/07

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

    Abstract translation: 本文公开了用于语音合成的系统,方法和非暂时的计算机可读存储介质。 实施该方法的系统接收一组有序列表的语音单元,对于有序列表组中的每个有序列表中的每个相应的语音单元,从适合于级联的下一个有序列表构建语音单元的子列表,执行一个 基于用于每个相应语音单元的语音单元的子列表,通过语音单元的有序列表集合的路径的成本分析,并且基于成本分析,通过所述一组有序列表使用语音单元的最低成本路径来合成语音。 有序列表可以基于每个语音单元的相应音调来排序。 在一个实施例中,可以分配不具有分配音调的语音单元。

    SYSTEM AND METHOD FOR INCREASING RECOGNITION RATES OF IN-VOCABULARY WORDS BY IMPROVING PRONUNCIATION MODELING
    10.
    发明申请
    SYSTEM AND METHOD FOR INCREASING RECOGNITION RATES OF IN-VOCABULARY WORDS BY IMPROVING PRONUNCIATION MODELING 有权
    通过改进发明建模来提高语义词识别率的系统和方法

    公开(公告)号:US20100145704A1

    公开(公告)日:2010-06-10

    申请号:US12328436

    申请日:2008-12-04

    CPC classification number: G06F17/277 G10L15/063 G10L15/187

    Abstract: Disclosed herein are systems, methods, and computer readable-media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying best potential pronunciations in a speech recognition context, and storing the identified best potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

    Abstract translation: 本文公开了用于生成用于语音识别的词典的系统,方法和计算机可读介质。 该方法包括接收符号输入作为标记的语音数据,基于符号输入过度生成潜在发音,识别语音识别语境中的最佳潜在发音,以及将识别的最佳潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。

Patent Agency Ranking