摘要:
Disclosed are various elements of a toolkit used for generating a TTS voice for use in a spoken dialog system. The invention in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method for preparing a text-to-speech (TTS) voice for testing and verification. The method comprises processing a TTS voice to be ready for testing, synthesizing words utilizing the TTS voice, presenting to a person a smallest possible subset that contains at least N instances of a group of units in the TTS voice, receiving information from the person associated with corrections needed to the TTS voice and making corrections to the TTS voice according to the received information.
摘要:
Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.
摘要:
Disclosed are various elements of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. One embodiment of the invention relates to a method of generating a database for a TTS voice. The method comprises matching every spoken word associated with a TTS voice database with a smallest set of possible pronunciations for each word. The smallest set is generated by automatically determining a dialect and linguistic context using linguistic rules, empirically determining idiosyncratic speaker characteristics and determining a subject domain. The method further comprises dynamically generating a pronunciation dictionary on a word-by-word basis using the smallest set.
摘要:
Disclosed are various elements of a toolkit used for generating a TTS voice for use in a spoken dialog system. The invention in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method for preparing a text-to-speech (TTS) voice for testing and verification. The method comprises processing a TTS voice to be ready for testing, synthesizing words utilizing the TTS voice, presenting to a person a smallest possible subset that contains at least N instances of a group of units in the TTS voice, receiving information from the person associated with corrections needed to the TTS voice and making corrections to the TTS voice according to the received information.
摘要:
Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.
摘要:
Disclosed herein are various innovations associated with a toolkit used for generating a TTS voice for use in a spoken dialog system. The inventions in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of enabling human workers to find errors when developing a text-to-speech (TTS) voice. The method comprises presenting a graphical user interface wherein after a first pass of automatic speech recognition (ASR) of a speech corpus is complete, the interface presents to a worker a graphical representation of an alignment of the ASR results, associated words and phonemes and the audio, receiving a graphical input from the worker associated with a selection of a word or phoneme and presenting the audio associated with the selected word or phoneme.
摘要:
Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.
摘要:
The present invention provides various elements of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. One embodiment of the invention relates to a method of correcting a database associated with the development of a text-to-speech (TTS) voice. The method comprises generating a pronunciation dictionary for use with a TTS voice, generating a TTS voice to a stage wherein it is prepared to be tested before being deployed, identifying mislabeled phonetic units associated with the TTS voice, for each identified mislabeled phonetic unit, linking to an entry within the pronunciation dictionary to correct the entry and deleting utterances and all associated data for unacceptable utterances.
摘要:
Disclosed are various elements of a toolkit used for generating a TTS voice for use in a spoken dialog system. The invention in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method for preparing a text-to-speech (TTS) voice for testing and verification. The method comprises processing a TTS voice to be ready for testing, synthesizing words utilizing the TTS voice, presenting to a person a smallest possible subset that contains at least N instances of a group of units in the TTS voice, receiving information from the person associated with corrections needed to the TTS voice and making corrections to the TTS voice according to the received information.