Abstract:
Predictive coding of signals, i.e., the reduction or redundancy in a signal by subtracting from it that part which can be predicted from its past, is a well-known technique for reducing the channel capacity required to transmit a signal with specified fidelity. It has been widely applied to signals, such as television signals which have regularly repeating intervals of information, but has not been satisfactorily applied to signals, such as speech, which exhibit characteristics that vary from speaker to speaker and from time to time for one speaker. According to this invention, an adaptive predictor is employed which is readjusted periodically to match the time-varying characteristics of a speech signal.
Abstract:
A short-time spectral analysis of a nonstationary signal, such as a speech signal, does not ordinarily yield control signal information sufficient for subsequent synthesis. However, more reliable control signals for a speech synthesizer can be obtained by making use of natural constraints, applicable to a speech wave, in the analysis procedure. For frequencies below 5 kHz., the human vocal tract can be modeled as an acoustic tube in which only plane waves propagate. Thus, for vowels and vowellike sounds, the speech output of the vocal tract at any instant of time can be assumed to be a weighted sum of its past values and the input to the vocal tract at that instant of time. In the described invention, a speech wave is represented by the output of a linear filter which simulates an acoustic tube and which is excited by a combination of a quasi-periodic pulse train and white noise. The parameters of this filter are derived from the speech wave such that the mean-squared error between the synthetic speech samples at the output of the filter and the input speech samples is minimum.