System for generating speech and non-speech audio messages
    1.
    发明授权
    System for generating speech and non-speech audio messages 有权
    用于产生语音和非语音音频消息的系统

    公开(公告)号:US06760704B1

    公开(公告)日:2004-07-06

    申请号:US09676104

    申请日:2000-09-29

    申请人: Steven M. Bennett

    发明人: Steven M. Bennett

    IPC分类号: G10L1300

    CPC分类号: G10L13/00

    摘要: An audio information system that may be used to form and convey an audio message having speech overlapped with non-speech audio is provided. The system has components to store a context indicator having non-speech audio to signify a characteristic of a speech content stream, to merge the context indicator with the speech content stream to form an integrated message, and to output the integrated message. The message has overlapping non-speech audio from the context indicator and speech audio. The system also has mechanisms to vary the format of integrated message generated in order to train the user on non-speech cues. In addition, other aspects of the present invention relating to the audio information system receiving content and generating an audio message are described.

    摘要翻译: 提供了可以用于形成和传达具有与非语音音频重叠的语音的音频消息的音频信息系统。 该系统具有存储具有非语音音频的上下文指示符以表示语音内容流的特征的组件,将语音指示符与语音内容流合并以形成集成消息,并输出综合消息。 消息具有来自上下文指示符和语音音频的重叠非语音音频。 该系统还具有改变生成的集成消息的格式以便在非语音线索上训练用户的机制。 此外,描述了与音频信息系统接收内容相关并且产生音频消息的本发明的其它方面。

    Device and method for reproduction of sounds with independently variable duration and pitch
    2.
    发明授权
    Device and method for reproduction of sounds with independently variable duration and pitch 失效
    用于再现具有独立可变持续时间和音高的声音的装置和方法

    公开(公告)号:US06748357B1

    公开(公告)日:2004-06-08

    申请号:US09008946

    申请日:1998-01-20

    申请人: Takashi Saruhashi

    发明人: Takashi Saruhashi

    IPC分类号: G10L1300

    摘要: The waveform generation device can reproduce waveform data of various sounds stored in a memory at a reproduction velocity of reproducing waveforms at a real time and is provided with a storage system for storing the waveform data of a waveform sequence with the plural waveforms arrange in time series, a pitch information input system for entering pitch information, a time information generating system for generating time information changing at a velocity having no connection with the pitch to be reproduced, and a reproduction means for reproducing the waveform data to be read at the reproduction pitch waveform data of the pitch information input system, including a reading system for reading the waveform data from the storage system at a desired reading velocity corresponding to the time information and the pitch information yet having no connection with the velocity of changing the time information.

    摘要翻译: 波形产生装置可以实时地再现波形的再现速度再现存储在存储器中的各种声音的波形数据,并且设置有用于存储具有以时间序列排列的多个波形的波形序列的波形数据的存储系统 ,用于输入音调信息的音调信​​息输入系统,用于产生以不再与要再现的音调相关的速度变化的时间信息的时间信息产生系统,以及用于再现以再现音调读取的波形数据的再现装置 音调信息输入系统的波形数据,包括用于以对应于时间信息的期望读取速度从存储系统读取波形数据的读取系统和与改变时间信息的速度无关的音高信息。

    Method and apparatus for generating speech from an electronic form
    3.
    发明授权
    Method and apparatus for generating speech from an electronic form 有权
    从电子表格生成语音的方法和装置

    公开(公告)号:US06697781B1

    公开(公告)日:2004-02-24

    申请号:US09760687

    申请日:2001-01-16

    IPC分类号: G10L1300

    CPC分类号: G10L13/00

    摘要: A speech-generating computer apparatus for generating speech from electronic forms, a method of controlling a computer and a computer-readable media containing program code embodying an application program for performing a method of generating speech. The computer has a speech-generating function and at least one screen reader program. The at least one screen reader program generates human perceptible speech with the speech-generating function. The computer determines if a particular screen reader program is active and initializes an object in a format of a particular screen reader program that is active.

    摘要翻译: 一种用于从电子形式产生语音的语音产生计算机装置,一种控制计算机的方法和一种包含程序代码的计算机可读介质,该程序代码包含用于执行产生语音的方法的应用程序。 计算机具有语音产生功能和至少一个屏幕阅读器程序。 至少一个屏幕阅读器程序利用语音产生功能产生人类可感知的语音。 计算机确定特定的屏幕阅读器程序是否处于活动状态,并以活跃的特定屏幕阅读器程序的格式初始化对象。

    Method and system for proofreading and correcting dictated text
    4.
    发明授权
    Method and system for proofreading and correcting dictated text 有权
    用于校对和纠正指定文本的方法和系统

    公开(公告)号:US06611802B2

    公开(公告)日:2003-08-26

    申请号:US09330668

    申请日:1999-06-11

    IPC分类号: G10L1300

    摘要: A method of proofreading and correcting dictated text contained in an electronic document comprises the steps of: selecting proofreading criteria for identifying textual errors contained in the electronic document; playing back each word contained in the electronic document; and, marking as a textual error each played back word in nonconformity with at least one of the proofreading criteria. The method can further comprise the step of editing each the marked textual error identified in the marking step. In particular, the editing step can include reviewing each the marked textual error identified in the marking step; accepting user specified changes to each marked textual error reviewed in the reviewing step; and, unmarking each marked textual error corrected by the user in the accepting step. Also, the reviewing step can include highlighting each the word in the electronic document corresponding to the marked textual error marked in the marking step; and, displaying an explanation for each marked textual error in a user interface. Moreover, the reviewing step can further include suggesting a recommended change to the marked textual error; displaying the recommended change in the user interface; and, accepting a user specified preference to substitute the recommended change for the marked textual error.

    摘要翻译: 一种校正和纠正电子文档中包含的指定文本的方法包括以下步骤:选择用于识别电子文档中包含的文本错误的校对标准; 播放电子文档中包含的每个单词; 并且标记为文本错误,每个都使用至少一个校对标准在不合格中播放单词。 该方法还可以包括编辑在标记步骤中识别的每个标记的文本错误的步骤。 特别地,编辑步骤可以包括查看在标记步骤中识别的标记的文本错误; 在审查步骤中审查的每个标记的文字错误接受用户指定的更改; 并且在接受步骤中取消标记由用户校正的每个标记的文本错误。 此外,审查步骤可以包括突出显示与标记步骤中标记的标记的文本错误相对应的电子文档中的单词; 并且在用户界面中显示每个标记的文本错误的说明。 此外,审查步骤还可以包括建议对标记的文字错误的建议更改; 显示用户界面中推荐的更改; 并且接受用户指定的首选项以替代所标记的文本错误的推荐更改。

    Method and system for recorded word concatenation
    5.
    发明授权
    Method and system for recorded word concatenation 有权
    记录字连接的方法和系统

    公开(公告)号:US06601030B2

    公开(公告)日:2003-07-29

    申请号:US09198105

    申请日:1998-11-23

    申请人: Ann K. Syrdal

    发明人: Ann K. Syrdal

    IPC分类号: G10L1300

    CPC分类号: G10L13/08

    摘要: A method and system are provided for performing recorded word concatenation to create a natural sounding sequence of words, numbers, phrases, sounds, etc. for example. The method and system may include a tonal pattern identification unit that identifies tonal patterns, such as pitch accents, phrase accents and boundary tones, for utterances in a particular domain, such as telephone numbers, credit card numbers, the spelling of words, etc.; a script designer that designs a script for recording a string of words, numbers, sounds etc., based on an appropriate rhythm and pitch range in order to obtain natural prosody for utterances in the particular domain and with minimum coarticulation between concatenative units; a script recorder that records a speaker's utterances of the domain strings; a recording editor that edits the recorded strings by marking the beginning and end of each word, number etc. in the string and including or inserting pauses according to the tonal patterns; and a concatenation unit that concatenates the edited recording into a smooth and natural sounding string of words, numbers, letters of the alphabet, etc., for audio output.

    摘要翻译: 提供了一种方法和系统,用于执行记录的字串连,以产生例如单词,数字,短语,声音等的自然的声音序列。 方法和系统可以包括音调模式识别单元,其识别用于特定领域中的话语的音调模式,例如音高重音,短语重音和边界音调,诸如电话号码,信用卡号码,字的拼写等。 ; 一个脚本设计师,设计一个基于适当的节奏和音高范围记录字符串,数字,声音等的脚本,以获得特定领域的话语的自然韵律,并以串联单元的最小化; 一个脚本记录器,用于记录说话者的字串串话; 记录编辑器,通过标记字符串中的每个单词,数字等的开始和结尾来编辑记录的字符串,并根据色调模式包括或插入暂停; 以及连接单元,其将编辑的记录连接成用于音频输出的单词,数字,字母表的字母等的平滑和自然的声音串。

    Text-to-speech e-mail reader with multi-modal reply processor
    6.
    发明授权
    Text-to-speech e-mail reader with multi-modal reply processor 失效
    具有多模式回复处理器的文本到语音电子邮件阅读器

    公开(公告)号:US06246983B1

    公开(公告)日:2001-06-12

    申请号:US09129649

    申请日:1998-08-05

    IPC分类号: G10L1300

    摘要: A multi-user e-mail reader system allows several users to access their e-mail accounts simultaneously and have the e-mail messages played back with speech synthesis. The user navigates through various functional states of the system using either touch-tone keypad commands or optionally voiced commands interpreted by a speech recognizer. Users can send reply e-mail messages without the use of a computer, by invoking the system's text processor. The text processor operates in conjunction with a keypad-to-ASCII conversion mechanism that allows fully punctuated and properly addressed e-mail messages to be composed from the touch-tone phone. Digital audio sound file attachments may be recorded through the telephone handset and attached to an outgoing e-mail message. A system for storing canned messages allows the user to quickly send pre-composed reply messages, either as stored or after editing using the text processor. The text processor uses a virtual cursor pointer that may be indexed forward and backward at different granularities, depending on whether the system is in play mode or record mode. The granularity can also be changed by the user.

    摘要翻译: 多用户电子邮件阅读器系统允许多个用户同时访问他们的电子邮件帐户,并通过语音合成播放电子邮件消息。 用户使用触摸音键盘命令或由语音识别器解释的可选语音命令导航系统的各种功能状态。 通过调用系统的文本处理器,用户可以不使用计算机发送回复电子邮件。 文本处理器与键盘到ASCII转换机制一起操作,允许由按键式电话组成的完全标点和正确地寻址的电子邮件消息。 数字音频声音文件附件可以通过电话听筒记录并附加到外发电子邮件。 用于存储固定消息的系统允许用户快速发送预编辑的回复消息,无论是存储还是使用文本处理器进行编辑之后。 文本处理器使用虚拟光标指针,可以根据系统是处于播放模式还是记录模式,以不同的粒度向前和向后进行索引。 粒度也可以由用户改变。

    Apparatus and method for speech-text-transmit communication over data networks
    7.
    发明授权
    Apparatus and method for speech-text-transmit communication over data networks 失效
    通过数据网络进行语音文本传输通信的装置和方法

    公开(公告)号:US06173250B2

    公开(公告)日:2001-01-09

    申请号:US09089855

    申请日:1998-06-03

    申请人: Kenneth Jong

    发明人: Kenneth Jong

    IPC分类号: G10L1300

    摘要: An apparatus and method for speech-text-transmit communication over data networks includes speech recognition devices and text to speech conversion devices that translate speech signals input to the terminal into text and text data received from a data network into speech output signals. The speech input signals are translated into text based on phonemes obtained from a spectral analysis of the speech input signals. The text data is transmitted to a receiving party over the data network as a plurality of text data packets such that a continuous stream of text data is obtained. The receiving party's terminal receives the text data and may immediately display the text data and/or translate it into speech output signals using the text to speech conversion device. The text to speech conversion device uses speech pattern data stored in a speech pattern database for synthesizing a human voice for playing of the speech output signals using a speech output device.

    摘要翻译: 用于通过数据网络进行语音文本传输通信的装置和方法包括语音识别装置和文本到语音转换装置,其将输入到终端的语音信号转换成从数据网络接收的文本和文本数据到语音输出信号。 基于从语音输入信号的频谱分析获得的音素将语音输入信号转换为文本。 文本数据通过数据网络作为多个文本数据分组发送到接收方,从而获得连续的文本数据流。 接收方的终端接收文本数据,并且可以使用文本到语音转换装置立即显示文本数据和/或将其转换为语音输出信号。 文本到语音转换装置使用存储在语音模式数据库中的语音模式数据,用于使用语音输出装置合成用于播放语音输出信号的人声。

    Speech synthesis for tasks with word and prosody dictionaries
    8.
    发明授权
    Speech synthesis for tasks with word and prosody dictionaries 失效
    用词和韵律词典进行任务的语音综合

    公开(公告)号:US06826530B1

    公开(公告)日:2004-11-30

    申请号:US09621544

    申请日:2000-07-21

    IPC分类号: G10L1300

    CPC分类号: G10L13/047 A63F2300/6063

    摘要: A plurality of tasks are set in a speech synthesizing process, in which at least one of speakers, emotion or situation at the time speeches are made, and contents of the speeches, is different, and word dictionaries, prosody dictionaries, and waveform dictionaries corresponding to respective tasks are organized. When a character string to be synthesized is input with the task specified through, for example, a game system, a speech synthesizing process is performed using the word dictionary, the prosody dictionary, and the waveform dictionary corresponding to the specified task. Therefore, a speech message can be generated depending on the personality of a speaker, the emotion or situation at the time when a speech is made, and the contents of the speech.

    摘要翻译: 语音合成过程中设置了多个任务,其中至少有一个讲话者,时间表达情绪或情况,演讲内容不同,词典,韵律词典和波形词典对应 组织各自的任务。 当通过例如游戏系统指定的任务输入要合成的字符串时,使用与指定任务对应的单词词典,韵律词典和波形词典来执行语音合成处理。 因此,可以根据说话者的个性,发言时的情绪或情况以及言语的内容来生成语音消息。

    Frame erasure compensation method in a variable rate speech coder
    9.
    发明授权
    Frame erasure compensation method in a variable rate speech coder 有权
    可变速率语音编码器中的帧擦除补偿方法

    公开(公告)号:US06584438B1

    公开(公告)日:2003-06-24

    申请号:US09557283

    申请日:2000-04-24

    IPC分类号: G10L1300

    摘要: A frame erasure compensation method in a variable-rate speech coder includes quantizing, with a first encoder, a pitch lag value for a current frame and a first delta pitch lag value equal to the difference between the pitch lag value for the current frame and the pitch lag value for the previous frame. A second, predictive encoder quantizes only a second delta pitch lag value for the previous frame (equal to the difference between the pitch lag value for the previous frame and the pitch lag value for the frame prior to that frame). If the frame prior to the previous frame is processed as a frame erasure, the pitch lag value for the previous frame is obtained by subtracting the first delta pitch lag value from the pitch lag value for the current frame. The pitch lag value for the erasure frame is then obtained by subtracting the second delta pitch lag value from the pitch lag value for the previous frame. Additionally, a waveform interpolation method may be used to smooth discontinuities caused by changes in the coder pitch memory.

    摘要翻译: 可变速率语音编码器中的帧擦除补偿方法包括:利用第一编码器量化当前帧的音调滞后值,以及等于当前帧的音调滞后值与第 前一帧的音调滞后值。 第二预测编码器仅量化前一帧的第二增量音调滞后值(等于先前帧的音调滞后值与该帧之前的帧的音调滞后值之间的差)。 如果先前帧之前的帧被作为帧擦除处理,则通过从当前帧的音调滞后值中减去第一增量音调滞后值来获得先前帧的音调滞后值。 然后通过从前一帧的音调滞后值减去第二增量音调滞后值来获得擦除帧的音调滞后值。 此外,可以使用波形插值方法来平滑由编码器音调存储器的变化引起的不连续性。

    Speech duration processing method and apparatus for Chinese text-to-speech system
    10.
    发明授权
    Speech duration processing method and apparatus for Chinese text-to-speech system 有权
    中文文字到语音系统的语音持续时间处理方法和装置

    公开(公告)号:US06542867B1

    公开(公告)日:2003-04-01

    申请号:US09536750

    申请日:2000-03-28

    IPC分类号: G10L1300

    CPC分类号: G10L13/10 G10L13/08

    摘要: The duration of speech varies according to the characteristics of pronounced speech and pronouncing habit of the speaker. In the speech duration processing method and apparatus of this invention, a large amount of natural speech was analyzed, and the following was known: Speech duration of monosyllables will vary according to factors, such as phonemes, tones, phrase construction, locations in the phrases, locations in the sentence, and front and rear connected phonemes, etc. of the syllables. Through the use of these varying factors, a “speech duration parameter storage portion” for speech duration parameters is constructed. By retrieving the speech duration parameters and combining the same with the basic speech duration of a syllable during syllable speech duration calculation, the speech duration of each monosyllable in any sentence can be accurately decided. As recognized from experimental results, a text-to-speech system using the speech duration processing apparatus of this invention can synthesize speech with natural speech duration.

    摘要翻译: 讲话的持续时间根据演讲者的发音和发音习惯的特点而变化。 在本发明的语音持续时间处理方法和装置中,分析了大量的自然语音,并且以下是已知的:单音节的语音持续时间将根据因素而变化,例如音素,音调,短语构造,短语中的位置 ,句子中的位置以及音节的前后连接音素等。 通过使用这些变化的因素,构建用于语音持续时间参数的“语音持续时间参数存储部分”。 通过检索语音持续时间参数并将其与音节语音持续时间计算中的音节的基本语音持续时间相结合,可以准确地确定任何句子中每个单音节的语音持续时间。 从实验结果可以看出,使用本发明的语音持续时间处理装置的文本到语音系统可以合成具有自然语音持续时间的语音。