SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH
    1.
    发明申请
    SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH 有权
    用于分布式语音模型的系统和方法用于嵌入式文本到语音的云和设备

    公开(公告)号:US20160086598A1

    公开(公告)日:2016-03-24

    申请号:US14953771

    申请日:2015-11-30

    CPC classification number: G10L13/04 G10L13/047 G10L13/07

    Abstract: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.

    Abstract translation: 用于智能缓存用于语音合成的级联语音单元的系统,方法和计算机可读存储介质。 配置为实施该方法的系统可以识别语音合成上下文,并且基于用于文本到语音语音的文本到语音单元的本地高速缓存并且基于语音合成上下文来确定附加的文本 - 不在本地缓存中的语音单元。 系统可以从服务器请求附加的文本到语音单元,并将附加的文本到语音单元存储在本地高速缓存中。 然后,系统可以使用本地高速缓存中的文本到语音单元和附加的文本到语音单元来合成语音。 系统可以根据本地存储的可用性,或合成语音之后随着上下文的变化修剪缓存。 本地缓存可以存储与文本到语音语音相关联的文本到语音单元的核心集合,其不能从本地高速缓存中修剪。

    SYSTEM AND METHOD FOR DATA-DRIVEN SOCIALLY CUSTOMIZED MODELS FOR LANGUAGE GENERATION
    2.
    发明申请
    SYSTEM AND METHOD FOR DATA-DRIVEN SOCIALLY CUSTOMIZED MODELS FOR LANGUAGE GENERATION 有权
    用于数据驱动的用于语言生成的社会定制模型的系统和方法

    公开(公告)号:US20150332665A1

    公开(公告)日:2015-11-19

    申请号:US14275938

    申请日:2014-05-13

    Abstract: Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user's social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user.

    Abstract translation: 用于使用特定于用户的演示风格来产生语音的系统,方法和计算机可读存储设备,特别是用户的社交组。 根据本公开配置的系统然后可以使用口头对话或呈现系统中的结果,个性化,文本和/或语音来与用户通信。 例如,实施所公开的方法的系统可以从用户接收语音,识别用户,并且通过应用个性化的自然语言生成模型对接收到的语音进行响应。 个性化的自然语言生成模型提供可以对所识别的用户特定的通信。

    System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring
    3.
    发明申请
    System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring 有权
    通过声学模型重构适应自动语音识别发音的系统和方法

    公开(公告)号:US20150243282A1

    公开(公告)日:2015-08-27

    申请号:US14698183

    申请日:2015-04-28

    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

    Abstract translation: 这里公开的是系统,计算机实现的方法和用于通过声学模型重构来适应自动语音识别发音来识别语音的计算机可读存储介质。 该方法识别在目标方言中典型的本地语音训练的声学模型和匹配的发音字典。 该方法从新的演讲者收集演讲,从而收集到的演讲并转录收集的演讲,以产生一个合理的音素格子。 然后,该方法创建一个自定义语音模型,用于通过用于所有似乎合理的音素的声学模型的加权和来表示在发音字典中使用的每个音素,其中发音字典不改变,而是在每个音素的声学空间的模型中 字典成为典型本地语音的音素的声学模型的加权和。 最后,该方法包括使用定制语音模型通过处理器从目标说话者识别附加语音。

    SYSTEM AND METHOD FOR SPEECH PERSONALIZATION BY NEED
    4.
    发明申请
    SYSTEM AND METHOD FOR SPEECH PERSONALIZATION BY NEED 有权
    需要个性化的系统和方法

    公开(公告)号:US20150213794A1

    公开(公告)日:2015-07-30

    申请号:US14679508

    申请日:2015-04-06

    CPC classification number: G10L15/07 G10L15/10 G10L15/265

    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions. The method can further store a speaker personalization profile having information for the modified set of allocated resources and recognize speech associated with the speaker based on the speaker personalization profile.

    Abstract translation: 这里公开了用于说话人识别个性化的系统,计算机实现的方法和有形的计算机可读存储介质。 该方法使用一组分配的资源来识别从与语音接口交互的扬声器接收的语音,所分配的资源的集合包括带宽,处理器时间,存储器和存储。 该方法记录与识别的语音相关联的度量,并且在记录度量之后,修改与记录的度量相称的所分配资源集合中的所分配的资源中的至少一个。 该方法使用经修改的分配资源集来识别来自扬声器的附加语音。 指标可以包括语音识别置信度分数,处理速度,对话行为,重复请求,对确认的否定响应以及任务完成。 该方法还可以存储具有用于所修改的分配资源集合的信息的扬声器个性化简档,并且基于说话者个性化简档识别与说话者相关联的语音。

    SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES
    5.
    发明申请
    SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES 有权
    用于前置修改单元选择数据库的系统和方法

    公开(公告)号:US20150325248A1

    公开(公告)日:2015-11-12

    申请号:US14275349

    申请日:2014-05-12

    Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

    Abstract translation: 系统,方法和计算机可读存储设备,以提高合成语音产生的质量。 系统从语音单元数据库中选择语音单元,对应于要转换为语音的文本的语音单元。 系统识别从所选择的语音单元产生的语音的期望韵律曲线,并且还识别语音单元的实际韵律曲线。 所选择的语音单元被修改,使得修改的语音单元的新的韵律曲线与期望的韵律曲线匹配。 系统将修改的语音单元存储到语音单元数据库中,以用于生成未来语音,从而增加数据库的韵律覆盖,期望提高输出质量。

    System and Method for Increasing Recognition Rates of In-Vocabulary Words By Improving Pronunciation Modeling
    6.
    发明申请
    System and Method for Increasing Recognition Rates of In-Vocabulary Words By Improving Pronunciation Modeling 有权
    通过改进发音建模来提高词汇量识别率的系统和方法

    公开(公告)号:US20150073797A1

    公开(公告)日:2015-03-12

    申请号:US14539221

    申请日:2014-11-12

    CPC classification number: G06F17/277 G10L15/063 G10L15/187

    Abstract: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes overgenerating potential pronunciations based on symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

    Abstract translation: 本公开涉及用于生成用于语音识别的词典的系统,方法和计算机可读介质。 该方法包括基于符号输入过度生成潜在发音,识别语音识别语境中的潜在发音,以及将识别的潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。

    System and Method for Pronunciation Modeling
    7.
    发明申请
    System and Method for Pronunciation Modeling 有权
    发音建模的系统和方法

    公开(公告)号:US20150006179A1

    公开(公告)日:2015-01-01

    申请号:US14488844

    申请日:2014-09-17

    CPC classification number: G10L15/187 G10L15/183 G10L2015/025

    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    Abstract translation: 系统,计算机实现的方法和用于生成发音模型的有形计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

    System and Method for Generalized Preselection for Unit Selection Synthesis
    8.
    发明申请
    System and Method for Generalized Preselection for Unit Selection Synthesis 有权
    单位选择综合广义预选系统与方法

    公开(公告)号:US20140350940A1

    公开(公告)日:2014-11-27

    申请号:US14454123

    申请日:2014-08-07

    CPC classification number: G10L13/06 G10L13/00 G10L13/047

    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.

    Abstract translation: 本文公开了用于单元选择合成的系统,计算机实现的方法和计算机可读存储介质。 该方法使得计算设备将辅助电话机添加到具有现有电话机的语音合成器前端,基于补充电话机修改单元预选过程,基于修改的单位预选过程从辅助电话机和现有电话机中预选单元 ,并根据预选单位产生语音。 补充手机可以是现有手机的变体,可以包括字边界特征,可以包括其中初始辅音簇和一些字边界用变音符标记的群集特征,可以包括将单位标记为源自于 功能词或内容词,和/或可以包括语音前或后声部特征。 语音合成器前端可以将补充的电话机作为额外的功能。

    SYSTEM AND METHOD FOR HANDLING MISSING SPEECH DATA
    9.
    发明申请
    SYSTEM AND METHOD FOR HANDLING MISSING SPEECH DATA 有权
    用于处理丢失语音数据的系统和方法

    公开(公告)号:US20140288937A1

    公开(公告)日:2014-09-25

    申请号:US14299745

    申请日:2014-06-09

    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score.

    Abstract translation: 本文公开了用于处理丢失的语音数据的系统,计算机实现的方法和有形的计算机可读介质。 计算机实现的方法包括接收具有缺失段的语音,为缺失段生成多个假设,识别缺失段的最佳假设,以及通过为缺失段插入所识别的最佳假设来识别接收到的语音。 在另一种方法实施例中,通过为缺失的段插入所识别的最佳假设,来代替最后的步骤来合成所接收的语音。 在一个方面,所述方法还包括识别缺失段的持续时间并为缺失段生成所识别的持续时间的多个假设。 识别缺失片段的最佳假设的步骤可以基于语音上下文,发音词典和/或语言模型。 每个假设可以具有相同的声学得分。

Patent Agency Ranking