-
1.
公开(公告)号:US20240127790A1
公开(公告)日:2024-04-18
申请号:US18045893
申请日:2022-10-12
发明人: Saurabh TAHILIANI , Subham BISWAS
IPC分类号: G10L13/08 , G06F40/30 , G10L13/047 , G10L13/07 , G10L15/16 , G10L15/187 , G10L15/22 , G10L19/005 , G10L25/18
CPC分类号: G10L13/08 , G06F40/30 , G10L13/047 , G10L13/07 , G10L15/16 , G10L15/187 , G10L15/22 , G10L19/005 , G10L25/18
摘要: A device may receive and convert audio data to text data in real-time, and may detect a network fluctuation that causes missing voice packets. The device may process partial text and context of the text data, with a model, to generate a new phrase, and may generate a response phoneme for the new phrase. The device may utilize a text embedding model to generate a text embedding for the response phoneme, and may process the audio data, with the model, to generate a target voice sequence. The device may utilize an audio embedding model to generate an audio embedding for the target voice sequence, and may combine the text embedding and the audio embedding to generate an embedding input vector. The device may process the embedding input vector, with an audio synthesis model, to generate a final voice response, and may provide the audio data and the final voice response.
-
公开(公告)号:US20190198009A1
公开(公告)日:2019-06-27
申请号:US16289263
申请日:2019-02-28
CPC分类号: G10L13/07 , G06F17/271 , G06Q30/0601
摘要: A method for merging incoming alerts for accessibility is described. A first input alert and a second input alert intended for presentation by a screen reader are received. If the first input alert and the second input alert have arrived with a specified time interval, the first input alert and the second input alert are combined into an output alert. The output alert is sent to a screen reader for presentation.
-
公开(公告)号:US20190130894A1
公开(公告)日:2019-05-02
申请号:US15796292
申请日:2017-10-27
发明人: Zeyu Jin , Gautham J. Mysore , Stephen DiVerdi , Jingwan Lu , Adam Finkelstein
CPC分类号: G10L13/08 , G06F17/24 , G10L13/00 , G10L13/04 , G10L13/06 , G10L13/07 , G10L15/02 , G10L21/00 , G10L2021/0135 , G11B27/022
摘要: Systems and techniques are disclosed for synthesizing a new word or short phrase such that it blends seamlessly in the context of insertion or replacement in an existing narration. In one such embodiment, a text-to-speech synthesizer is utilized to say the word or phrase in a generic voice. Voice conversion is then performed on the generic voice to convert it into a voice that matches the narration. An editor and interface are described that support fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and guidance by the editors own voice.
-
4.
公开(公告)号:US09761218B2
公开(公告)日:2017-09-12
申请号:US14953771
申请日:2015-11-30
发明人: Benjamin J. Stern , Mark Charles Beutnagel , Alistair D. Conkie , Horst J. Schroeter , Amanda Joy Stent
IPC分类号: G10L13/07 , G10L13/04 , G10L13/047
CPC分类号: G10L13/04 , G10L13/047 , G10L13/07
摘要: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify, in a local cache of text-to-speech units for a text-to-speech voice an absent text-to-speech unit which is not in the local cache. The system can request from a server the absent text-to-speech unit. The system can then synthesize speech using the text-to-speech units and a received text-to-speech unit from the server.
-
公开(公告)号:US20170186418A1
公开(公告)日:2017-06-29
申请号:US15308731
申请日:2014-06-05
发明人: Paolo Mairano , Corinne Bos-Plachez , Sourav Nandy , Johan Wouters , Silvia Maria Antonella Quazza , Dong-Jian Yue
IPC分类号: G10L13/10 , G10L13/047 , G10L13/07
CPC分类号: G10L13/10 , G10L13/047 , G10L13/07 , G10L13/08
摘要: A text-to-speech (TTS) system includes components capable of supporting the generation of speech output in any of multiple styles, and may switch seamlessly from producing speech output in one style to producing speech output in another style. For example, a concatenative TTS system may include a speech base storing speech units associated with multiple speech styles, and a linguistic analysis component to generate a phonetic transcription specifying speech output in any of multiple styles. Text input may include a style indication associated with a particular segment of the input text. The linguistic analysis component may invoke encoded rules and/or components based upon the style indication, and generate a phonetic transcription specifying a speech style, which may be processed to generate output speech.
-
公开(公告)号:US20160093289A1
公开(公告)日:2016-03-31
申请号:US14499444
申请日:2014-09-29
发明人: Vincent Pollet
IPC分类号: G10L13/08 , G10L13/047 , G10L13/027
CPC分类号: G10L13/027 , G10L13/047 , G10L13/07 , G10L13/08
摘要: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments.
摘要翻译: 执行多风格语音合成的技术。 这些技术包括使用至少一个计算机硬件处理器来执行:获得包括文本的输入和用于将文本呈现为语音的第一说话风格的标识; 识别用于将文本呈现为语音的多个语音片段,所识别的多个语音片段包括具有第一说话风格的第一语音片段和具有不同于第一说话风格的第二说话风格的第二语音片段; 以及至少部分地通过使用所识别的多个语音片段将文本呈现为具有第一说话风格的语音。
-
公开(公告)号:US09275631B2
公开(公告)日:2016-03-01
申请号:US13731268
申请日:2012-12-31
摘要: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.
-
公开(公告)号:US09147393B1
公开(公告)日:2015-09-29
申请号:US13767987
申请日:2013-02-15
申请人: Boris Fridman-Mintz
发明人: Boris Fridman-Mintz
IPC分类号: G10L13/08 , G10L15/00 , G10L13/00 , G10L15/04 , G10L13/07 , G10L13/10 , G10L13/06 , G10L13/047 , G10L13/04 , G10L13/02 , G10L13/033
CPC分类号: G10L15/04 , G10L13/00 , G10L13/02 , G10L13/033 , G10L13/04 , G10L13/047 , G10L13/06 , G10L13/07 , G10L13/08 , G10L13/10 , G10L15/02 , G10L25/18 , G10L2015/027
摘要: Speech is modeled as a cognitively-driven sensory-motor activity where the form of speech is the result of categorization processes that any given subject recreates by focusing on creating sound patterns that are represented by syllables. These syllables are then combined in characteristic patterns to form words, which are in turn, combined in characteristic patterns to form utterances. A speech recognition process first identifies syllables in an electronic waveform representing ongoing speech. The pattern of syllables is then deconstructed into a standard form that is used to identify words. The words are then concatenated to identify an utterance. Similarly, a speech synthesis process converts written words into patterns of syllables. The pattern of syllables is then processed to produce the characteristic rhythmic sound of naturally spoken words. The words are then assembled into an utterance which is also processed to produce a natural sounding speech.
摘要翻译: 言语被模仿为认知驱动的感觉运动活动,其中言语形式是任何给定主体通过专注于创建由音节表示的声音模式而重新创建的分类过程的结果。 这些音节然后以特征模式组合以形成单词,它们又以特征模式组合以形成话语。 语音识别过程首先识别表示正在进行的语音的电子波形中的音节。 然后将音节的模式解构为用于识别单词的标准形式。 然后连接这些单词以识别话语。 类似地,语音合成过程将书写词转换成音节的模式。 然后对音节的形式进行处理,以产生自然口语的特征节律声音。 这些话然后被组合成一个话语,也被处理以产生自然的声音。
-
公开(公告)号:US08798998B2
公开(公告)日:2014-08-05
申请号:US12754045
申请日:2010-04-05
申请人: Huicheng Song , Guoliang Zhang , Zhiwei Weng
发明人: Huicheng Song , Guoliang Zhang , Zhiwei Weng
CPC分类号: G10L13/07
摘要: Pre-saved concatenation cost data is compressed through speech segment grouping. Speech segments are assigned to a predefined number of groups based on their concatenation cost values with other speech segments. A representative segment is selected for each group. The concatenation cost between two segments in different groups may then be approximated by that between the representative segments of their respective groups, thereby reducing an amount of concatenation cost data to be pre-saved.
摘要翻译: 预先保存的级联成本数据通过语音段分组进行压缩。 基于与其他语音段的级联成本值,语音段被分配给预定数量的组。 为每个组选择一个代表性的段。 然后,不同组中的两个段之间的级联成本可以由其各自组的代表段之间的级联成本近似,从而减少要预先保存的级联成本数据的数量。
-
公开(公告)号:US20140210828A1
公开(公告)日:2014-07-31
申请号:US13750199
申请日:2013-01-25
申请人: APPLE INC.
IPC分类号: G06T11/60 , G06F3/0488
CPC分类号: G06T11/60 , G06F3/04842 , G06F3/0488 , G06F3/04883 , G06F3/04886 , G06T11/203 , G06T2200/24 , G10L13/07 , G10L13/08
摘要: Methods for presenting symbolic expressions such as mathematical, scientific, or chemical expressions, formulas, or equations are performed by a computing device. One method includes: displaying a first portion of a symbolic expression within a first area of a display screen; while in a first state in which the first area is selected for aural presentation, aurally presenting first information related to the first portion of the symbolic expression; while in the first state, detecting particular user input; in response to detecting the particular user input, performing the steps of: transitioning from the first state to a second state in which a second area, of the display, is selected for aural presentation; determining second information associated with a second portion, of the symbolic expression, that is displayed within the second area; in response to determining the second information, aurally presenting the second information.
摘要翻译: 用于呈现诸如数学,科学或化学表达式,公式或等式的符号表达式的方法由计算设备执行。 一种方法包括:在显示屏幕的第一区域内显示符号表达式的第一部分; 在选择第一区域用于听觉呈现的第一状态中,听觉地呈现与符号表达的第一部分有关的第一信息; 在第一状态下,检测特定的用户输入; 响应于检测到特定用户输入,执行以下步骤:从第一状态转换到第二状态,其中显示的第二区域被选择用于听觉呈现; 确定在所述第二区域内显示的符号表达式的与第二部分相关联的第二信息; 响应于确定第二信息,听觉地呈现第二信息。
-
-
-
-
-
-
-
-
-