-
公开(公告)号:US12170094B2
公开(公告)日:2024-12-17
申请号:US18047572
申请日:2022-10-18
Applicant: QUALCOMM Incorporated
Inventor: Stephane Villette , Sen Li , Pravin Kumar Ramadas , Daniel Jared Sinder
Abstract: A device includes one or more processors configured to input one or more segments of an input media stream into a feature extractor. The one or more processors are further configured to pass an output of the feature extractor into an utterance classifier to produce at least one representation of at least one utterance class of a plurality of utterance classes. The one or more processors are further configured to pass the output of the feature extractor and the at least one representation into a segment matcher to produce a media output segment identifier.
-
公开(公告)号:US20240233706A1
公开(公告)日:2024-07-11
申请号:US18562962
申请日:2022-05-23
Applicant: Microsoft Technology Licensing, LLC
Inventor: Xu TAN , Tao Qin , Sheng Zhao , Tie-Yan Liu
IPC: G10L13/10 , G10L13/047 , G10L13/06
CPC classification number: G10L13/10 , G10L13/047 , G10L13/06 , G10L2013/105
Abstract: According to implementations of the subject matter described herein, a solution is proposed for text to speech. In this solution, an initial phoneme sequence corresponding to text is generated, the initial phoneme sequence comprising feature representations of a plurality of phonemes. A first phoneme sequence is generated by inserting a feature representation of an additional phoneme into the initial phoneme sequence, the additional phoneme being related to a characteristic of spontaneous speech. The duration of a phoneme among the plurality of phonemes and the additional phoneme is determined by using an expert model corresponding to the phoneme, and a second phoneme sequence is generated based on the first phoneme sequence. Spontaneous-style speech corresponding to the text is determined based on the second phoneme sequence. In this way, spontaneous-style speech with more varying rhythms can be generated based on spontaneous-style additional phonemes and multiple expert models.
-
公开(公告)号:US11978431B1
公开(公告)日:2024-05-07
申请号:US17326886
申请日:2021-05-21
Applicant: Amazon Technologies, Inc.
Inventor: Arnaud Joly , Simon Slangen , Alexis Pierre Moinet , Thomas Renaud Drugman , Panagiota Karanasou , Syed Ammar Abbas , Sri Vishnu Kumar Karlapati
IPC: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32
CPC classification number: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32
Abstract: A speech-processing system receives input data representing text. One or more encoders trained to predict audio properties corresponding to the text process the text to predict those properties. A speech decoder processes phoneme embeddings as well as the predicted properties to create data representing synthesized speech.
-
公开(公告)号:US11942072B2
公开(公告)日:2024-03-26
申请号:US17439197
申请日:2021-02-03
Applicant: Sang Rae Park
Inventor: Sang Rae Park
CPC classification number: G10L13/10 , G10L13/033 , G10L13/06 , G10L15/22 , G10L15/26 , G10L19/0018
Abstract: Disclosed is a wireless communication device including a voice recognition portion configured to convert a voice signal input through a microphone into a syllable information stream using voice recognition, an encoding portion configured to encode the syllable information stream to generate digital transmission data, a transmission portion configured to modulate from the digital transmission data to a transmission signal and transmit the transmission signal through an antenna, a reception portion configured to demodulate from a reception signal received through the antenna to a digital reception data and output the digital reception data, a decoding portion configured to decode the digital reception data to generate the syllable information stream and a voice synthesis portion configured to convert the syllable information stream into the voice signal using voice synthesis and output the voice signal through a speaker.
-
公开(公告)号:US20240071343A1
公开(公告)日:2024-02-29
申请号:US18272175
申请日:2022-01-13
Applicant: RIFFIT INC
Inventor: Leonardus H.T. Van Der Ploeg , Deepak Savadatti
CPC classification number: G10H1/0025 , G10L13/06 , G10L13/10 , G10H2210/056 , G10H2210/111 , G10H2250/455 , G10L2013/105
Abstract: Described herein are musical translation devices and methods of use thereof. Exemplary uses of musical translation devices include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user.
-
公开(公告)号:US11676571B2
公开(公告)日:2023-06-13
申请号:US17154372
申请日:2021-01-21
Applicant: QUALCOMM Incorporated
Inventor: Kyungguen Byun , Sunkuk Moon , Shuhua Zhang , Vahid Montazeri , Lae-Hoon Kim , Erik Visser
IPC: G10L13/10 , G10L13/06 , G10L15/22 , G10L13/00 , G10L13/047 , G10L13/033 , G10L19/02 , G10L25/63 , G06N3/045 , G10L21/013
CPC classification number: G10L13/047 , G06N3/045 , G10L13/033 , G10L19/02 , G10L25/63 , G10L2021/0135
Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.
-
公开(公告)号:US20230005468A1
公开(公告)日:2023-01-05
申请号:US17779518
申请日:2019-11-26
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Mizuki NAGANO , Yusuke IJIMA , Nozomi KOBAYASHI
IPC: G10L13/10 , G06F40/268 , G10L13/047 , G10L13/06
Abstract: A pause estimation model learning apparatus includes: a morphological analysis unit configured to perform morphological analysis on training text data to provide M types of information, M being an integer that is equal to or larger than 2; a feature selection unit configured to combine N pieces of information, among the M pieces of information, to be an input feature when a predetermined certain condition is satisfied, and select predetermined one of the N pieces of information to be the input feature when the certain condition is not satisfied, N being an integer that is equal to or larger than 2 and equal to or smaller than M; and a learning unit configured to learn a pause estimation model by using the input feature selected by the feature selection unit and a pause correct label.
-
公开(公告)号:US11488577B2
公开(公告)日:2022-11-01
申请号:US16907006
申请日:2020-06-19
Inventor: Zhipeng Chen , Jinfeng Bai , Lei Jia
IPC: G10L13/047 , G06N3/08 , G10L13/06 , G10L13/08
Abstract: The present application discloses a training method and an apparatus for a speech synthesis model, electronic device, and storage medium. The method includes: taking a syllable input sequence, a phoneme input sequence and a Chinese character input sequence of a current sample as inputs of an encoder of a model to be trained, to obtain encoded representations of these three sequences at an output end of the encoder; fusing the encoded representations of these three sequences, to obtain a weighted combination of these three sequences; taking the weighted combination as an input of an attention module, to obtain a weighted average of the weighted combination at each moment at an output end of the attention module; taking the weighted average as an input of a decoder of the model to be trained, to obtain a speech Mel spectrum of the current sample at an output end of the decoder.
-
公开(公告)号:US20220277728A1
公开(公告)日:2022-09-01
申请号:US17631695
申请日:2020-06-17
Applicant: Microsoft Technology Licensing, LLC
Inventor: Shaofei Zhang , Lei He
IPC: G10L13/08 , G10L13/047 , G10L13/06 , G10L25/30
Abstract: The present disclosure provides a method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained. A phone feature of the text input may be generated. Context features of the text input may be generated based on a set of sentences associated with the text input. A speech waveform corresponding to the text input may be generated based on the phone feature and the context features.
-
公开(公告)号:US11348569B2
公开(公告)日:2022-05-31
申请号:US16841839
申请日:2020-04-07
Applicant: KABUSHIKI KAISHA TOSHIBA
Inventor: Masatsune Tamura , Masahiro Morita
IPC: G10L25/18 , G10L13/06 , G10L13/047
Abstract: A speech processing device includes a hardware processor configured to receive input speech and extract speech frames from the input speech. The hardware processor is configured to calculate a spectrum parameter for each of the speech frames, calculate a first phase spectrum for each of the speech frames, calculate a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum, calculate a band group delay parameter in a predetermined frequency band from the group delay spectrum, and calculate a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum. The hardware processor is configured to generate a speech waveform based on the spectrum parameter, the band group delay parameter, and the band group delay compensation parameter.
-
-
-
-
-
-
-
-
-