Patent search ap:("Google LLC") AND inv:"Chun-an Chan" Page 1

1.

发明授权
Text-to-speech synthesis using an autoencoder 有权

公开(公告)号：US10249289B2

公开(公告)日：2019-04-02

申请号：US15649311

申请日：2017-07-13

Applicant: Google LLC

Inventor： Byung Ha Chun , Javier Gonzalvo , Chun-an Chan , Ioannis Agiomyrgiannakis , Vincent Ping Leung Wan , Robert Andrew James Clark , Jakub Vit

IPC: G10L13/06 , G10L19/00 , G10L25/30 , G10L13/027 , G10L13/047

Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.

2.

发明公开
Attention-Based Clockwork Hierarchical Variational Encoder 审中-公开

公开(公告)号：US20240038214A1

公开(公告)日：2024-02-01

申请号：US18487227

申请日：2023-10-16

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

CPC classification number: G10L13/10 , G10L25/30 , G10L2013/105

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.

3.

发明授权
Training neural networks to generate structured embeddings 有权

公开(公告)号：US11790274B2

公开(公告)日：2023-10-17

申请号：US18049995

申请日：2022-10-26

Applicant: Google LLC

Inventor： Robert Andrew James Clark , Chun-an Chan , Vincent Ping Leung Wan

IPC: G06N20/00 , G06N3/084 , G06N3/045

CPC classification number: G06N20/00 , G06N3/045 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model to generate embeddings of inputs to the machine learning model, the machine learning model having an encoder that generates the embeddings from the inputs and a decoder that generates outputs from the generated embeddings, wherein the embedding is partitioned into a sequence of embedding partitions that each includes one or more dimensions of the embedding, the operations comprising: for a first embedding partition in the sequence of embedding partitions: performing initial training to train the encoder and a decoder replica corresponding to the first embedding partition; for each particular embedding partition that is after the first embedding partition in the sequence of embedding partitions: performing incremental training to train the encoder and a decoder replica corresponding to the particular partition.

4.

发明申请
Two-Level Text-To-Speech Systems Using Synthetic Training Data 有权

公开(公告)号：US20230018384A1

公开(公告)日：2023-01-19

申请号：US17305809

申请日：2021-07-14

Applicant: Google LLC

Inventor： Lev Finkelstein , Chun-an Chan , Byungha Chun , Norman Casagrande , Yu Zhang , Robert Andrew James Clark , Vincent Wan

IPC: G10L13/08 , G10L13/047

Abstract: A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. The method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.

5.

发明申请
Clockwork Hierarchal Variational Encoder 有权

公开(公告)号：US20220172705A1

公开(公告)日：2022-06-02

申请号：US17650452

申请日：2022-02-09

Applicant: Google LLC

Inventor： Robert Andrew James Clark , Chun-an Chan , Vincent Ping Leung Wan

IPC: G10L15/06 , G10L15/22 , G10L15/16 , G10L25/18 , G10L25/24 , G10L15/02 , G06N3/04 , G06N3/08 , G10L25/21

Abstract: A method for providing a frame-based mel spectral representation of speech includes receiving a text utterance having at least one word, and selecting a mel spectral embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. For each phoneme, using the selected mel spectral embedding, the method also includes: predicting a duration of the corresponding phoneme by encoding linguistic features of the corresponding phoneme with a corresponding syllable embedding for the syllable that includes the corresponding phoneme; and generating a plurality of fixed-length predicted mel-frequency spectrogram frames based on the predicted duration for the corresponding phoneme. Each fixed-length predicted mel-frequency spectrogram frame representing mel-spectral information of the corresponding phoneme.

6.

发明授权
Attention-based clockwork hierarchical variational encoder 有权

公开(公告)号：US12080272B2

公开(公告)日：2024-09-03

申请号：US17756264

申请日：2019-12-10

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

CPC classification number: G10L13/10 , G10L25/30 , G10L2013/105

Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.

7.

发明授权
Clockwork hierarchal variational encoder 有权

公开(公告)号：US11664011B2

公开(公告)日：2023-05-30

申请号：US17650452

申请日：2022-02-09

Applicant: Google LLC

Inventor： Robert Andrew James Clark , Chun-an Chan , Vincent Ping Leung Wan

IPC: G10L15/06 , G10L15/22 , G10L15/16 , G10L25/18 , G10L25/24 , G10L15/02 , G06N3/084 , G10L25/21 , G06N3/044 , G06N3/045

CPC classification number: G10L15/063 , G06N3/044 , G06N3/045 , G06N3/084 , G10L15/02 , G10L15/16 , G10L15/22 , G10L25/18 , G10L25/21 , G10L25/24 , G10L2015/025 , G10L2015/027

Abstract: A method of providing a frame-based mel spectral representation of speech includes receiving a text utterance having at least one word and selecting a mel spectral embedding for the text utterance. Each word has at least one syllable and each syllable has at least one phoneme. For each phoneme, the method further includes using the selected mel spectral embedding to: (i) predict a duration of the corresponding phoneme based on corresponding linguistic features associated with the word that includes the corresponding phoneme and corresponding linguistic features associated with the syllable that includes the corresponding phoneme; and (ii) generate a plurality of fixed-length predicted mel-frequency spectrogram frames based on the predicted duration for the corresponding phoneme. Each fixed-length predicted mel-frequency spectrogram frame represents mel-spectral information of the corresponding phoneme.

8.

发明申请
TRAINING NEURAL NETWORKS TO GENERATE STRUCTURED EMBEDDINGS 有权

公开(公告)号：US20230060886A1

公开(公告)日：2023-03-02

申请号：US18049995

申请日：2022-10-26

Applicant: Google LLC

Inventor： Robert Andrew James Clark , Chun-an Chan , Vincent Ping Leung Wan

IPC: G06N20/00 , G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model to generate embeddings of inputs to the machine learning model, the machine learning model having an encoder that generates the embeddings from the inputs and a decoder that generates outputs from the generated embeddings, wherein the embedding is partitioned into a sequence of embedding partitions that each includes one or more dimensions of the embedding, the operations comprising: for a first embedding partition in the sequence of embedding partitions: performing initial training to train the encoder and a decoder replica corresponding to the first embedding partition; for each particular embedding partition that is after the first embedding partition in the sequence of embedding partitions: performing incremental training to train the encoder and a decoder replica corresponding to the particular partition.

9.

发明申请
Attention-Based Clockwork Hierarchical Variational Encoder 有权

公开(公告)号：US20220415306A1

公开(公告)日：2022-12-29

申请号：US17756264

申请日：2019-12-10

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.

10.

发明申请
Clockwork Hierarchical Variational Encoder 审中-公开

公开(公告)号：US20200074985A1

公开(公告)日：2020-03-05

申请号：US16678981

申请日：2019-11-08

Applicant: Google LLC

Inventor： Robert Andrew James Clark , Chun-an Chan , Vincent Ping Leung Wan

IPC: G10L15/06 , G10L15/22 , G10L15/16 , G10L25/18 , G10L25/24 , G10L25/21 , G10L15/02 , G06N3/04 , G06N3/08

Abstract: A method for providing a frame-based mel spectral representation of speech includes receiving a text utterance having at least one word, and selecting a mel spectral embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. For each phoneme, using the selected mel spectral embedding, the method also includes: predicting a duration of the corresponding phoneme by encoding linguistic features of the corresponding phoneme with a corresponding syllable embedding for the syllable that includes the corresponding phoneme; and generating a plurality of fixed-length predicted mel-frequency spectrogram frames based on the predicted duration for the corresponding phoneme. Each fixed-length predicted mel-frequency spectrogram frame representing mel-spectral information of the corresponding phoneme.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification