Concatenation of speech segments by use of a speech synthesizer
    41.
    发明授权
    Concatenation of speech segments by use of a speech synthesizer 有权
    使用语音合成器连接语音段

    公开(公告)号:US06366883B1

    公开(公告)日:2002-04-02

    申请号:US09250405

    申请日:1999-02-16

    IPC分类号: G10L1308

    摘要: In a speech synthesizer apparatus, a weighting coefficient training controller calculates acoustic distances in second acoustic feature parameters between one target phoneme from the same phoneme and the phoneme candidates other than the target phoneme based on first acoustic feature parameters and prosodic feature parameters, and determines weighting coefficient vectors for respective target phonemes defining degrees of contribution to the second acoustic feature parameters for respective phoneme candidates by executing a predetermined statistical analysis therefor. Then, a speech unit selector searches for a combination of phoneme candidates which correspond to a phoneme sequence of an input sentence and which minimizes a cost including a target cost representing approximate costs between a target phoneme and the phoneme candidates and a concatenation cost representing approximate costs between two phoneme candidates to be adjacently concatenated, and outputs index information on the searched out combination of phoneme candidates. Further, a speech synthesizer synthesizes a speech signal corresponding to the input phoneme sequence by sequentially reading out speech segments of speech waveform signals corresponding to the index information and concatenating the read speech segments of the speech waveform signals.

    摘要翻译: 在语音合成器装置中,加权系数训练控制器基于第一声学特征参数和韵律特征参数来计算来自相同音素的一个目标音素和除了目标音素之外的音素候选者之间的第二声学特征参数中的声学距离,并且确定加权 通过对其进行预定的统计分析来确定各个音素候选的第二声学特征参数的贡献度的各个目标音素的系数矢量。 然后,语音单元选择器搜索对应于输入句子的音素序列的音素候选的组合,并且使包括目标音素和音素候选者之间的近似成本的目标成本的成本最小化,以及代表近似成本的级联成本 在两个音素候选者之间相邻连接,并输出关于所搜索出的音素候选组合的索引信息。 此外,语音合成器通过顺序地读出对应于索引信息的语音波形信号的语音段并连接语音波形信号的读出的语音段,来合成对应于输入音素序列的语音信号。

    Expressivity of voice synthesis
    42.
    发明申请
    Expressivity of voice synthesis 失效
    语音合成的表现

    公开(公告)号:US20020026315A1

    公开(公告)日:2002-02-28

    申请号:US09872966

    申请日:2001-06-01

    IPC分类号: G10L013/00

    CPC分类号: G10L13/04 G10L13/07

    摘要: Voice synthesis with improved expressivity is obtained in a voice synthesiser of source-filter type by making use of a library of source sound categories in the source module. Each source sound category corresponds to a particular morphological category and is derived from analysis of real vocal sounds, by inverse filtering so as to subtract the effect of the vocal tract. The library may be parametrical, that is, the stored data corresponds not to the inverse-filtered sounds themselves but to synthesis coefficients for resynthesising the inverse-filtered sounds using any suitable re-synthesis technique, such as the phase vocoder technique. The coefficients are derived by STFT analysis.

    摘要翻译: 通过使用源模块中的源声音类别库,在源滤波器类型的语音合成器中获得具有改进的表现力的语音合成。 每个源声音类别对应于特定的形态类别,并且是通过反演滤波来分析真实的声音,从而减去声带的效果。 该库可以是参数化的,也就是说,所存储的数据本身不对应于逆滤波的声音,而是对应于使用任何合适的重新合成技术(例如相位声码器技术)重新合成反向滤波的声音的合成系数。 系数通过STFT分析得出。

    Method and apparatus for speech synthesis and program recorded medium
    43.
    发明授权
    Method and apparatus for speech synthesis and program recorded medium 失效
    用于语音合成和程序记录介质的方法和装置

    公开(公告)号:US06081781A

    公开(公告)日:2000-06-27

    申请号:US926037

    申请日:1997-09-09

    IPC分类号: G10L13/04 G10L13/07 G10L13/02

    CPC分类号: G10L13/07 G10L13/04

    摘要: Data in the same range of the fundamental frequency F.sub.0 as speech segments are used as learning data to prepare a reference codebook CB.sub.M for a spectrum envelope. The same learning data for a higher range than F.sub.0 and the same learning data for a lower range are subject to a linear stretch matching with respect to the learning data for the range F.sub.0. For each vector code in the reference codebook CB.sub.M, the spectrum envelope is clustered to prepare a high range codebook CB.sub.H and a low range codebook CB.sub.L. The spectrum envelope of input speech segments are fuzzy vector quantized (S402) with the reference codebook, and depending on the synthesized F.sub.0, a high, middle or low codebooks is selected. The selected codebook is used to decode the fuzzy vector quantized code, and the decoded output is subject to the inverse FFT. Alternatively, codebooks CM.sub.MH and CB.sub.ML each comprising differential vectors for corresponding code vectors between CB.sub.M and CB.sub.H and between CB.sub.M and CB.sub.L are prepared. The quantized code is decoded using either CB.sub.MH or CB.sub.ML, and the decoded differential vector is stretched in accordance with a difference in the fundamental frequency between the synthesized speech and the original speech for CB.sub.M. The stretched differential vector is added to the code vector which was used for the fuzzy vector quantization.

    摘要翻译: 与作为语音段的基频F0相同范围的数据用作学习数据,以准备用于频谱包络的​​参考码本CBM。 相对于范围F0的学习数据,相对于比F0高的范围的学习数据和相同的较低范围的学习数据进行线性拉伸匹配。 对于参考码本CBM中的每个矢量码,频谱包络被聚类以准备高范围码本CBH和低范围码本CBL。 输入语音段的频谱包络与参考码本进行模糊矢量量化(S402),并且根据合成的F0,选择高,中,低码本。 所选码本用于对模糊矢量量化码进行解码,解码输出经过逆FFT。 或者,准备各自包含CBM和CBH之间以及CBM和CBL之间的对应代码矢量的差分向量的码本CMMH和CBML。 使用CBMH或CBML对量化的码进行解码,并且根据合成语音和CBM的原始语音之间的基频的差异来解码解码的差分向量。 将拉伸的微分矢量加到用于模糊矢量量化的码矢量中。

    Envelope-invariant analytical speech resynthesis using periodic signals
derived from reharmonized frame spectrum
    44.
    发明授权
    Envelope-invariant analytical speech resynthesis using periodic signals derived from reharmonized frame spectrum 失效
    包络不变的分析语音再合成,使用从重新调和的帧频导出的周期信号

    公开(公告)号:US5987413A

    公开(公告)日:1999-11-16

    申请号:US869368

    申请日:1997-06-05

    CPC分类号: G10L13/07 G10L21/04

    摘要: Method envelope-invariant for audio signal synthesis from elementary audio waveforms stored in a dictionary wherein:the waveforms are perfectly periodic, and stored as one of their period,synthesis is obtained by overlap-adding of the waveforms obtained from time-domain repetition of the periodic waveforms with a weighting window whose size is approximately two times the period of the signals to weight, and whose relative position inside of the period is fixed to any value identical for all the periods, each extracted from a reharmonized and thus periodic waveform, obtained by modifying, without changing the spectral envelope, the frequencies and amplitudes of harmonics in the spectrum of a frame of the original continuous speech waveform,whereby the time shift between two successive waveforms obtained by weighting the original signals is set to the imposed fundamental frequency of the signal to synthesize.

    摘要翻译: 用于存储在词典中的基本音频波形的音频信号合成的方法包络不变量,其中:波形是完全周期性的,并且作为其周期之一存储,通过从时域重复获得的波形的重叠相加获得合成 具有加权窗口的周期性波形,其加权窗口的大小是要加权的信号的周期的两倍,并且其周期内的相对位置被固定为所有周期的任何值,每个周期从获得的重新调谐的和因此的周期性波形中提取出 通过在不改变频谱包络的​​情况下修改原始连续语音波形的帧的频谱中的谐波的频率和幅度,由此将通过对原始信号进行加权而获得的两个连续波形之间的时移设置为施加的基频 信号合成。

    Synthesizing speech by converting phonemes to digital waveforms
    45.
    发明授权
    Synthesizing speech by converting phonemes to digital waveforms 失效
    通过将音素转换为数字波形来合成语音

    公开(公告)号:US5970454A

    公开(公告)日:1999-10-19

    申请号:US844859

    申请日:1997-04-23

    申请人: Andrew Paul Breen

    发明人: Andrew Paul Breen

    IPC分类号: G10L13/06 G10L5/02

    CPC分类号: G10L13/07

    摘要: Synthetic speech is generated by production of a digital waveform from a text in phonemes. A linked database is used which comprises an extended text in phonemes and its equivalent in the form of a digital waveform. The two portions of the database are linked by a parameter which establishes equivalent points in both the phoneme text and the digital waveform. The input text (in phonemes) is analyzed to locate a matching portion in the phoneme portion of the database. This matching utilizes exact equivalence of phonemes where this is possible; otherwise relation between phonemes is utilized. The selection process identifies input phonemes in context whereby improved conversions are obtained. Having analyzed the input exit into matching strings in the input form of the database beginning and ending parameters for the sections are established. The output text is produced by abutting sections of the digital waveform and defined by the beginning and ending parameters.

    摘要翻译: 通过从音素中的文本生成数字波形来产生合成语音。 使用链接的数据库,其包括音素中的扩展文本及其数字波形形式的等效文本。 通过在音素文本和数字波形中建立等效点的参数来链接数据库的两个部分。 分析输入文本(在音素中)以定位数据库的音素部分中的匹配部分。 这种匹配利用了可能的音素的精确等效性; 否则使用音素之间的关系。 选择过程在上下文中识别输入音素,从而获得改进的转换。 将数据库的输入形式的输入出口分析为匹配的字符串,建立了这些部分的开始和结束参数。 输出文本由数字波形的邻接部分生成,并由开始和结束参数定义。

    Interpolating between representative frame waveforms of a prediction
error signal for speech synthesis
    46.
    发明授权
    Interpolating between representative frame waveforms of a prediction error signal for speech synthesis 失效
    在用于语音合成的预测误差信号的代表性帧波形之间插值

    公开(公告)号:US5890118A

    公开(公告)日:1999-03-30

    申请号:US613093

    申请日:1996-03-08

    IPC分类号: G10L11/00 G10L13/06 G10L9/04

    CPC分类号: G10L13/07

    摘要: A speech synthesis apparatus includes; a memory for storing a plurality of typical waveforms corresponding to a plurality of frames, the typical waveforms each previously obtained by extracting in units of at least one frame from a prediction error signal formed in predetermined units, a voiced speech source generator including an interpolation circuit for performing interpolation between the typical waveforms read out from the memory means to obtain a plurality of interpolation signals each having at least one of an interpolation pitch period and a signal level which changes smoothly between the corresponding frames, a superposition circuit for superposing the interpolation signals obtained by the interpolation circuit to form a voiced speech source signal, an unvoiced speech source generator for generating an unvoiced speech source signal, and a vocal tract filter selectively driven by the voiced speech source signal outputted from the voiced speech source generator and the unvoiced speech source signal from the unvoiced speech source generator to generate synthetic speech. Further, interpolation positions can be determined bases on the pitch period.

    摘要翻译: 语音合成装置包括: 用于存储对应于多个帧的多个典型波形的存储器,通过以预定单位形成的预测误差信号以至少一帧为单位提取而获得的典型波形,包括内插电路的有声语音源发生器 用于在从存储器装置读出的典型波形之间执行内插以获得多个内插信号,每个内插信号具有内插音调周期和相应帧之间平滑改变的信号电平中的至少一个;叠加电路,用于叠加插值信号 通过内插电路获得以形成有声语音源信号,用于产生无声语音源信号的无声语音源发生器和由从有声语音源发生器输出的有声语音源信号和无声语音选择性地驱动的声道滤波器 源信号来自无声 语音源生成器生成合成语音。 此外,可以基于音调周期来确定插值位置。

    Apparatus for synthesizing speech by varying pitch
    47.
    发明授权
    Apparatus for synthesizing speech by varying pitch 失效
    用于通过变化的音调合成语音的装置

    公开(公告)号:US5787398A

    公开(公告)日:1998-07-28

    申请号:US702933

    申请日:1996-08-26

    申请人: Andrew Lowry

    发明人: Andrew Lowry

    IPC分类号: G10L13/02 G10L13/06 G10L9/00

    摘要: The pitch of synthesized speech signals is varied by separating the speech signals into a spectral component and an excitation component. The latter is multiplied by a series of overlapping window functions synchronous, in the case of voiced speech, with pitch timing mark information corresponding at least approximately to instants of vocal excitation, to separate it into windowed speech segments which are added together again after the application of a controllable time-shift. The spectral and excitation components are then recombined. The multiplication employs at least two windows per pitch period, each having a duration of less than one pitch period. Alternatively each window has a duration of less than twice the pitch period between timing marks and is asymmetric about the timing mark.

    摘要翻译: 通过将语音信号分离为频谱分量和激励分量来改变合成语音信号的音调。 后者乘以一系列重叠的窗口函数,在有声语音的情况下同步,音调定时标记信息至少近似地对应于声乐激励的时刻,以将其分离成在应用之后再次相加在一起的窗口语音段 的可控时移。 然后重新组合光谱和激发成分。 乘法在每个音调周期采用至少两个窗口,每个窗口具有小于一个音调周期的持续时间。 或者,每个窗口具有小于定时标记之间的音调周期的两倍的持续时间,并且关于定时标记是不对称的。

    Method of speech synthesis by means of concentration and partial
overlapping of waveforms
    48.
    发明授权
    Method of speech synthesis by means of concentration and partial overlapping of waveforms 失效
    通过波形的级联和部分重叠的语音合成方法

    公开(公告)号:US5774855A

    公开(公告)日:1998-06-30

    申请号:US528713

    申请日:1995-09-15

    IPC分类号: G10L13/04 G10L9/12

    摘要: A synthesis method in which that part of each interval of the original signal which contains the fundamental information is left unchanged, and only the remaining part of the interval is altered. In this way, not only is processing time reduced, but the natural sound of the synthetic signal is also improved. The main part of the interval is an exact reproduction of the original signal. At least the waveforms associated to voiced sounds are subdivided into a plurality of intervals, corresponding to the responses of the vocal duct to a series of excitation impulses of the vocal cords, synchronous with the fundamental frequency of the signal. Each interval is subjected to a weighting. The signals resulting from the weighting are replaced with a replica thereof shifted in time by an amount that depends on a prosodic information. The synthesis is then carried out by overlapping and adding the shifted signals. In each interval of original signal to be reproduced in synthesis, an unchanging part is identified, which contains the fundamental information and which is reproduced unaltered in the synthesized signal, and the operations of weighting, overlapping and adding involve only the remaining part of the interval. The search utilizes searching among all zero crossings for a suitable division between the unchanging and variable parts.

    摘要翻译: 一种合成方法,其中包含基本信息的原始信号的每个间隔的部分保持不变,并且仅间隔的剩余部分被改变。 以这种方式,不仅处理时间缩短,而且合成信号的自然声音也得到改善。 间隔的主要部分是原始信号的精确再现。 至少与有声声音相关联的波形被细分为多个间隔,其对应于与声带的基频同步的声带的一系列激励脉冲的响应。 每个间隔都进行加权。 由加权产生的信号被替换为其在时间上偏移取决于韵律信息的量的副本。 然后通过重叠并添加移位信号来执行合成。 在合成中要再现的原始信号的每个间隔中,识别出不合格部分,其包含基本信息,并且在合成信号中不改变,并且加权,重叠和加法的操作仅涉及间隔的剩余部分 。 搜索利用在所有零交叉点之间的搜索来在不变部分和可变部分之间进行合适的划分。

    Speech synthesis with weighted parameters at phoneme boundaries
    49.
    发明授权
    Speech synthesis with weighted parameters at phoneme boundaries 失效
    在音素边界加权参数的语音合成

    公开(公告)号:US5659664A

    公开(公告)日:1997-08-19

    申请号:US468640

    申请日:1995-06-06

    申请人: Jaan Kaja

    发明人: Jaan Kaja

    IPC分类号: C10L9/02 G10L13/04 G10L5/04

    CPC分类号: G10L13/07 G10L13/04 G10L25/15

    摘要: The invention relates to a method and an arrangement for speech synthesis and provides an automatic mechanism for simulating human speech. The method provides a number of control parameters for controlling a speech synthesis device. The invention solves the problem of coarticulation by using an interpolation mechanism. The control parameters are stored in a matrix or a sequence list for each polyphone. The behaviour of the respective parameter with time is defined around each phoneme boundary and polyphones are joined by forming a weighted mean value of the curves which are defined by their two associated matrices/sequences list. The invention also provides an arrangement for carrying out the method.

    摘要翻译: 本发明涉及一种用于语音合成的方法和装置,并且提供了一种用于模拟人类语音的自动机制。 该方法提供用于控制语音合成设备的多个控制参数。 本发明通过使用插值机制解决了共聚焦问题。 控制参数存储在每个polyphone的矩阵或序列表中。 通过形成由它们的两个相关联的矩阵/序列表定义的曲线的加权平均值,在每个音素边界周围定义相应参数随时间的行为。 本发明还提供了一种用于执行该方法的装置。

    Method and apparatus for speech generation from phonetic codes
    50.
    发明授权
    Method and apparatus for speech generation from phonetic codes 失效
    从语音代码发出语音的方法和装置

    公开(公告)号:US5463715A

    公开(公告)日:1995-10-31

    申请号:US998459

    申请日:1992-12-30

    申请人: Richard T. Gagnon

    发明人: Richard T. Gagnon

    IPC分类号: G10L13/06 G10L9/00

    CPC分类号: G10L13/07

    摘要: Speech generation from phonetic code is carried out by a microcomputer based system which stores digitized waveform segments and appropriately joins the segments and outputs them to a digital to analog converter and then to a speaker. An allophone is generated for each phoneme designated by the phonetic codes according to the articulation type of each adjacent phoneme. Each phoneme is classified as neutral, labial, glottal, or medial according to its effect on the articulation of adjacent phonemes. Each phoneme is characterized by at least one center waveform dependent on the phonetic code, and an initial waveform and a final waveform, each of which depend on the phonetic code and the articulation type of the neighboring phoneme. Tables of waveform pointers are accessed according to phonetic code and articulation type, and other tables provide articulation types, times of each waveform portion, transition rate, fricative state, and pitch for each phonetic code. Adjacent waveforms are gradually blended together. Continuously varying center waveforms are afforded by indexing through successive waveform pointers at a given rate during the center phoneme period, the rate and the period being retrieved from the tables.

    摘要翻译: 通过基于微机的系统进行语音生成,该系统存储数字化波形段并适当地连接这些段并将它们输出到数模转换器,然后输出到扬声器。 根据每个相邻音素的关节类型,由语音代码指定的每个音素产生一个异音素。 根据对相邻音素的发音的影响,每个音素被分类为中性,阴唇,声门或内侧。 每个音素的特征在于取决于语音代码的至少一个中心波形,以及初始波形和最终波形,其中每一个取决于语音代码和相邻音素的关节运动类型。 波形指针表根据语音代码和关节运动类型进行访问,其他表格提供每个语音代码的关节运动类型,每个波形部分的时间,转换速率,摩擦状态和音调。 相邻波形逐渐混合在一起。 通过在中心音素周期期间以给定的速率通过连续的波形指针进行索引来提供连续变化的中心波形,从表中检索的速率和周期。