Patent search ipc:"G10L13/06" Page 1

1.

发明授权
Media segment prediction for media generation 有权

公开(公告)号：US12170094B2

公开(公告)日：2024-12-17

申请号：US18047572

申请日：2022-10-18

Applicant: QUALCOMM Incorporated

Inventor： Stephane Villette , Sen Li , Pravin Kumar Ramadas , Daniel Jared Sinder

IPC: G10L13/06 , G10L17/02 , G10L21/01 , G10L25/54 , G10L15/26

Abstract: A device includes one or more processors configured to input one or more segments of an input media stream into a feature extractor. The one or more processors are further configured to pass an output of the feature extractor into an utterance classifier to produce at least one representation of at least one utterance class of a plurality of utterance classes. The one or more processors are further configured to pass the output of the feature extractor and the at least one representation into a segment matcher to produce a media output segment identifier.

2.

发明公开
TEXT-BASED SPEECH GENERATION 审中-公开

公开(公告)号：US20240233706A1

公开(公告)日：2024-07-11

申请号：US18562962

申请日：2022-05-23

Applicant: Microsoft Technology Licensing, LLC

Inventor： Xu TAN , Tao Qin , Sheng Zhao , Tie-Yan Liu

IPC: G10L13/10 , G10L13/047 , G10L13/06

CPC classification number: G10L13/10 , G10L13/047 , G10L13/06 , G10L2013/105

Abstract: According to implementations of the subject matter described herein, a solution is proposed for text to speech. In this solution, an initial phoneme sequence corresponding to text is generated, the initial phoneme sequence comprising feature representations of a plurality of phonemes. A first phoneme sequence is generated by inserting a feature representation of an additional phoneme into the initial phoneme sequence, the additional phoneme being related to a characteristic of spontaneous speech. The duration of a phoneme among the plurality of phonemes and the additional phoneme is determined by using an expert model corresponding to the phoneme, and a second phoneme sequence is generated based on the first phoneme sequence. Spontaneous-style speech corresponding to the text is determined based on the second phoneme sequence. In this way, spontaneous-style speech with more varying rhythms can be generated based on spontaneous-style additional phonemes and multiple expert models.

3.

发明授权
Synthetic speech processing by representing text by phonemes exhibiting predicted volume and pitch using neural networks 有权

公开(公告)号：US11978431B1

公开(公告)日：2024-05-07

申请号：US17326886

申请日：2021-05-21

Applicant: Amazon Technologies, Inc.

Inventor： Arnaud Joly , Simon Slangen , Alexis Pierre Moinet , Thomas Renaud Drugman , Panagiota Karanasou , Syed Ammar Abbas , Sri Vishnu Kumar Karlapati

IPC: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32

CPC classification number: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32

Abstract: A speech-processing system receives input data representing text. One or more encoders trained to predict audio properties corresponding to the text process the text to predict those properties. A speech decoder processes phoneme embeddings as well as the predicted properties to create data representing synthesized speech.

4.

发明授权
Wireless communication device using voice recognition and voice synthesis 有权

公开(公告)号：US11942072B2

公开(公告)日：2024-03-26

申请号：US17439197

申请日：2021-02-03

Applicant: Sang Rae Park

Inventor： Sang Rae Park

IPC: G10L13/10 , G10L13/033 , G10L13/06 , G10L15/22 , G10L15/26 , G10L19/00

CPC classification number: G10L13/10 , G10L13/033 , G10L13/06 , G10L15/22 , G10L15/26 , G10L19/0018

Abstract: Disclosed is a wireless communication device including a voice recognition portion configured to convert a voice signal input through a microphone into a syllable information stream using voice recognition, an encoding portion configured to encode the syllable information stream to generate digital transmission data, a transmission portion configured to modulate from the digital transmission data to a transmission signal and transmit the transmission signal through an antenna, a reception portion configured to demodulate from a reception signal received through the antenna to a digital reception data and output the digital reception data, a decoding portion configured to decode the digital reception data to generate the syllable information stream and a voice synthesis portion configured to convert the syllable information stream into the voice signal using voice synthesis and output the voice signal through a speaker.

5.

发明公开
SYSTEMS AND METHODS FOR TRANSPOSING SPOKEN OR TEXTUAL INPUT TO MUSIC 审中-公开

公开(公告)号：US20240071343A1

公开(公告)日：2024-02-29

申请号：US18272175

申请日：2022-01-13

Applicant: RIFFIT INC

Inventor： Leonardus H.T. Van Der Ploeg , Deepak Savadatti

IPC: G10H1/00 , G10L13/06 , G10L13/10

CPC classification number: G10H1/0025 , G10L13/06 , G10L13/10 , G10H2210/056 , G10H2210/111 , G10H2250/455 , G10L2013/105

Abstract: Described herein are musical translation devices and methods of use thereof. Exemplary uses of musical translation devices include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user.

6.

发明授权
Synthesized speech generation 有权

公开(公告)号：US11676571B2

公开(公告)日：2023-06-13

申请号：US17154372

申请日：2021-01-21

Applicant: QUALCOMM Incorporated

Inventor： Kyungguen Byun , Sunkuk Moon , Shuhua Zhang , Vahid Montazeri , Lae-Hoon Kim , Erik Visser

IPC: G10L13/10 , G10L13/06 , G10L15/22 , G10L13/00 , G10L13/047 , G10L13/033 , G10L19/02 , G10L25/63 , G06N3/045 , G10L21/013

CPC classification number: G10L13/047 , G06N3/045 , G10L13/033 , G10L19/02 , G10L25/63 , G10L2021/0135

Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.

7.

发明申请
POSE ESTIMATION MODEL LEARNING APPARATUS, POSE ESTIMATION APPARATUS, METHODS AND PROGRAMS FOR THE SAME 有权

公开(公告)号：US20230005468A1

公开(公告)日：2023-01-05

申请号：US17779518

申请日：2019-11-26

Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventor： Mizuki NAGANO , Yusuke IJIMA , Nozomi KOBAYASHI

IPC: G10L13/10 , G06F40/268 , G10L13/047 , G10L13/06

Abstract: A pause estimation model learning apparatus includes: a morphological analysis unit configured to perform morphological analysis on training text data to provide M types of information, M being an integer that is equal to or larger than 2; a feature selection unit configured to combine N pieces of information, among the M pieces of information, to be an input feature when a predetermined certain condition is satisfied, and select predetermined one of the N pieces of information to be the input feature when the certain condition is not satisfied, N being an integer that is equal to or larger than 2 and equal to or smaller than M; and a learning unit configured to learn a pause estimation model by using the input feature selected by the feature selection unit and a pause correct label.

8.

发明授权
Training method and apparatus for a speech synthesis model, and storage medium 有权

公开(公告)号：US11488577B2

公开(公告)日：2022-11-01

申请号：US16907006

申请日：2020-06-19

Applicant: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventor： Zhipeng Chen , Jinfeng Bai , Lei Jia

IPC: G10L13/047 , G06N3/08 , G10L13/06 , G10L13/08

Abstract: The present application discloses a training method and an apparatus for a speech synthesis model, electronic device, and storage medium. The method includes: taking a syllable input sequence, a phoneme input sequence and a Chinese character input sequence of a current sample as inputs of an encoder of a model to be trained, to obtain encoded representations of these three sequences at an output end of the encoder; fusing the encoded representations of these three sequences, to obtain a weighted combination of these three sequences; taking the weighted combination as an input of an attention module, to obtain a weighted average of the weighted combination at each moment at an output end of the attention module; taking the weighted average as an input of a decoder of the model to be trained, to obtain a speech Mel spectrum of the current sample at an output end of the decoder.

9.

发明申请
Paragraph synthesis with cross utterance features for neural TTS 有权

公开(公告)号：US20220277728A1

公开(公告)日：2022-09-01

申请号：US17631695

申请日：2020-06-17

Applicant: Microsoft Technology Licensing, LLC

Inventor： Shaofei Zhang , Lei He

IPC: G10L13/08 , G10L13/047 , G10L13/06 , G10L25/30

Abstract: The present disclosure provides a method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained. A phone feature of the text input may be generated. Context features of the text input may be generated based on a set of sentences associated with the text input. A speech waveform corresponding to the text input may be generated based on the phone feature and the context features.

10.

发明授权
Speech processing device, speech processing method, and computer program product using compensation parameters 有权

公开(公告)号：US11348569B2

公开(公告)日：2022-05-31

申请号：US16841839

申请日：2020-04-07

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor： Masatsune Tamura , Masahiro Morita

IPC: G10L25/18 , G10L13/06 , G10L13/047

Abstract: A speech processing device includes a hardware processor configured to receive input speech and extract speech frames from the input speech. The hardware processor is configured to calculate a spectrum parameter for each of the speech frames, calculate a first phase spectrum for each of the speech frames, calculate a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum, calculate a band group delay parameter in a predetermined frequency band from the group delay spectrum, and calculate a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum. The hardware processor is configured to generate a speech waveform based on the spectrum parameter, the band group delay parameter, and the band group delay compensation parameter.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification