摘要:
A process for speech analysis and more specifically an automatic process for the analysis of continuous speech. The waveshape of the speech is described with the aid of the resonant frequencies, formants, which arise in the speech organ. The process determines suitable frequencies for the formants from an utterance by dividing the utterance into time frames and analyzing the utterance by linear prediction in order to determine roots of the denominator polynomial and thereby frequency values for each frame. The utterance is divided into voiced regions and in each voiced region the centers of vowel sounds are established in order to obtain a number of starting points. Tracks are formed from the starting points by sorting the roots from frame to frame so that old and new roots are linked together. Factors of merit are calculated for the tracks relative to formants and the tracks are distributed to formants in accordance with the factors of merit. The factors of merit take into consideration the bandwidth, continuity and relation to the formants of the tracks. The process gives a global optimisation by delaying the formant allocation until a complete voiced region has been analyzed. By linking the tracks together in this way, additional/false resonances can be controlled, which resonances arise in association with linear prediction.
摘要:
The invention relates to a method and an arrangement for speech synthesis and provides an automatic mechanism for simulating human speech. The method provides a number of control parameters for controlling a speech synthesis device. The invention solves the problem of coarticulation by using an interpolation mechanism. The control parameters are stored in a matrix or a sequence list for each polyphone. The behaviour of the respective parameter with time is defined around each phoneme boundary and polyphones are joined by forming a weighted mean value of the curves which are defined by their two associated matrices/sequences list. The invention also provides an arrangement for carrying out the method.
摘要:
A method for synthesizing speech using concatenation and Hanning-windows, in which a synthetic waveform is formed by concatenation of suitably selected parts of recorded human speech, the selected parts being windowed out with a Hanning window and copied into suitably selected locations in the synthetic waveform. The method is adapted to synthesize unvoiced consonants and includes the steps of palindromically copying suitably selected parts of the recorded human speech to form a synthesized waveform for the unvoiced consonant using concatenation. The method may be used for diphone, or polyphone, synthesis. The advantage of this palindromic synthesis method is that when the copying process has been reversed the second time there is either no repetition of identical blocks, or else the time difference between repetitions is markedly larger in comparison with known methods, thus minimizing unwanted periodic artifacts in the synthesized speech.