SYSTEMS AND METHODS FOR MULTI-STYLE SPEECH SYNTHESIS
    6.
    发明申请
    SYSTEMS AND METHODS FOR MULTI-STYLE SPEECH SYNTHESIS 有权
    多种语音合成的系统和方法

    公开(公告)号:US20160093289A1

    公开(公告)日:2016-03-31

    申请号:US14499444

    申请日:2014-09-29

    发明人: Vincent Pollet

    摘要: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments.

    摘要翻译: 执行多风格语音合成的技术。 这些技术包括使用至少一个计算机硬件处理器来执行:获得包括文本的输入和用于将文本呈现为语音的第一说话风格的标识; 识别用于将文本呈现为语音的多个语音片段,所识别的多个语音片段包括具有第一说话风格的第一语音片段和具有不同于第一说话风格的第二说话风格的第二语音片段; 以及至少部分地通过使用所识别的多个语音片段将文本呈现为具有第一说话风格的语音。

    Speech synthesis system, speech synthesis program product, and speech synthesis method

    公开(公告)号:US09275631B2

    公开(公告)日:2016-03-01

    申请号:US13731268

    申请日:2012-12-31

    IPC分类号: G10L13/10 G10L13/00 G10L13/07

    CPC分类号: G10L13/00 G10L13/07 G10L13/10

    摘要: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.

    Syllable based speech processing method
    8.
    发明授权
    Syllable based speech processing method 有权
    基于音节的语音处理方法

    公开(公告)号:US09147393B1

    公开(公告)日:2015-09-29

    申请号:US13767987

    申请日:2013-02-15

    摘要: Speech is modeled as a cognitively-driven sensory-motor activity where the form of speech is the result of categorization processes that any given subject recreates by focusing on creating sound patterns that are represented by syllables. These syllables are then combined in characteristic patterns to form words, which are in turn, combined in characteristic patterns to form utterances. A speech recognition process first identifies syllables in an electronic waveform representing ongoing speech. The pattern of syllables is then deconstructed into a standard form that is used to identify words. The words are then concatenated to identify an utterance. Similarly, a speech synthesis process converts written words into patterns of syllables. The pattern of syllables is then processed to produce the characteristic rhythmic sound of naturally spoken words. The words are then assembled into an utterance which is also processed to produce a natural sounding speech.

    摘要翻译: 言语被模仿为认知驱动的感觉运动活动,其中言语形式是任何给定主体通过专注于创建由音节表示的声音模式而重新创建的分类过程的结果。 这些音节然后以特征模式组合以形成单词,它们又以特征模式组合以形成话语。 语音识别过程首先识别表示正在进行的语音的电子波形中的音节。 然后将音节的模式解构为用于识别单词的标准形式。 然后连接这些单词以识别话语。 类似地,语音合成过程将书写词转换成音节的模式。 然后对音节的形式进行处理,以产生自然口语的特征节律声音。 这些话然后被组合成一个话语,也被处理以产生自然的声音。

    Pre-saved data compression for TTS concatenation cost
    9.
    发明授权
    Pre-saved data compression for TTS concatenation cost 有权
    TTS连接成本预先保存的数据压缩

    公开(公告)号:US08798998B2

    公开(公告)日:2014-08-05

    申请号:US12754045

    申请日:2010-04-05

    IPC分类号: G10L13/00 G10L13/08

    CPC分类号: G10L13/07

    摘要: Pre-saved concatenation cost data is compressed through speech segment grouping. Speech segments are assigned to a predefined number of groups based on their concatenation cost values with other speech segments. A representative segment is selected for each group. The concatenation cost between two segments in different groups may then be approximated by that between the representative segments of their respective groups, thereby reducing an amount of concatenation cost data to be pre-saved.

    摘要翻译: 预先保存的级联成本数据通过语音段分组进行压缩。 基于与其他语音段的级联成本值,语音段被分配给预定数量的组。 为每个组选择一个代表性的段。 然后,不同组中的两个段之间的级联成本可以由其各自组的代表段之间的级联成本近似,从而减少要预先保存的级联成本数据的数量。

    ACCESSIBILITY TECHINQUES FOR PRESENTATION OF SYMBOLIC EXPRESSIONS
    10.
    发明申请
    ACCESSIBILITY TECHINQUES FOR PRESENTATION OF SYMBOLIC EXPRESSIONS 有权
    用于表示符号表达的可访问性技术

    公开(公告)号:US20140210828A1

    公开(公告)日:2014-07-31

    申请号:US13750199

    申请日:2013-01-25

    申请人: APPLE INC.

    IPC分类号: G06T11/60 G06F3/0488

    摘要: Methods for presenting symbolic expressions such as mathematical, scientific, or chemical expressions, formulas, or equations are performed by a computing device. One method includes: displaying a first portion of a symbolic expression within a first area of a display screen; while in a first state in which the first area is selected for aural presentation, aurally presenting first information related to the first portion of the symbolic expression; while in the first state, detecting particular user input; in response to detecting the particular user input, performing the steps of: transitioning from the first state to a second state in which a second area, of the display, is selected for aural presentation; determining second information associated with a second portion, of the symbolic expression, that is displayed within the second area; in response to determining the second information, aurally presenting the second information.

    摘要翻译: 用于呈现诸如数学,科学或化学表达式,公式或等式的符号表达式的方法由计算设备执行。 一种方法包括:在显示屏幕的第一区域内显示符号表达式的第一部分; 在选择第一区域用于听觉呈现的第一状态中,听觉地呈现与符号表达的第一部分有关的第一信息; 在第一状态下,检测特定的用户输入; 响应于检测到特定用户输入,执行以下步骤:从第一状态转换到第二状态,其中显示的第二区域被选择用于听觉呈现; 确定在所述第二区域内显示的符号表达式的与第二部分相关联的第二信息; 响应于确定第二信息,听觉地呈现第二信息。