Abstract:
A method is provided for prosody generation by unit selection from an imitation speech database. A rule based method of text to speech conversion is used to produce a set of intonation events by selecting syllables on which there would be either a pitch peak or dip (or a combination), and produces the parameters to generate a pitch curve of the event. The synthetic pitch curve shape generated by the rule based method is then utilized to select the best matching units from an imitation speech database of a speaker's prosody, which are then concatenated to reduce the final prosody.
Abstract:
Techniques for language instruction and teaching are described. Methods focus on the sound distinctions that learners have trouble discriminating. Learners practice discriminating these sounds. A learning system is developed using databases of speech from people discriminating these sounds. An embodiment of a method according to the present disclosure can utilize sets of words that differ by only a single syllable containing a sound that is difficult to pronounce, as a way to teach the pronunciation of a word. The sets of similar words can be of a desired number or have a desired number of constituent members. Embodiments of systems can include user interfaces and a automated speech recognition system, including suitable automated speech recognition software, that can interact with a user, e.g., in a pedagogical setting. Related software products including computer-readable instructions resident in a computer-readable medium are described. HMM and DTW algorithms may be used for the embodiments.
Abstract:
A prosody modification system for use in text-to-speech includes an input receiving a sequence of prosodic data vectors Pn, measured at time Tn, which samples a sound waveform. A prosody data warping module directly derives new prosodic data vectors Qn from the original data vectors Pn using a function, which is controlled by warping parameters A0, . . . Ak, which avoids round-off errors in deriving quantized values, which has derivatives with respect to A0, . . . Ak, Pn, and Tn that are continuous, and which has sufficiently high complexity to model intentional prosody of the sound waveform, and sufficiently low complexity to avoid modeling micro-prosody of the sound waveform. The smoothness and simplicity of the function ensure that micro-prosodic perturbations and errors in measurement of Tn are transferred directly to the output Qn. The errors are thus reversed during re-synthesis and therefore eliminated, resulting in micro-prosodic perturbations being preserved during re-synthesis.