Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Srikanth Ronanki"

1.

发明授权
Redacting portions of text transcriptions generated from inverse text normalization 有权

公开(公告)号：US12182498B1

公开(公告)日：2024-12-31

申请号：US17810302

申请日：2022-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Monica Lakshmi Sunkara , Deepthi Devaiah Devanira , Chaitanya Shivade , Sravan Babu Bodapati , Katrin Kirchhoff , Srikanth Ronanki

IPC: G06F40/166 , G06F21/62 , G06F40/279 , G10L15/16 , G10L15/22

Abstract: Portions of text data generated from inverse text normalization may be redacted. Text data for redaction may be obtained. One or more inverse text normalization models may be applied to the text data to generate normalized text data. A machine learning model, trained to recognize text for redaction, may be applied to identify portions of the normalized text data for redaction. The identified portions may be redacted and the redacted normalized text provided to a destination.

2.

发明授权
Text-to-speech (TTS) processing 有权

公开(公告)号：US12272350B2

公开(公告)日：2025-04-08

申请号：US18664461

申请日：2024-05-15

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/10 , G10L13/06 , G10L25/18

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

3.

发明授权
Text-to-speech (TTS) processing 有权

公开(公告)号：US11990118B2

公开(公告)日：2024-05-21

申请号：US18206301

申请日：2023-06-06

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/10 , G10L13/06 , G10L25/18

CPC classification number: G10L13/10 , G10L13/06 , G10L25/18

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

4.

发明授权
Text-to-speech (TTS) processing with transfer of vocal characteristics 有权

公开(公告)号：US11410684B1

公开(公告)日：2022-08-09

申请号：US16430894

申请日：2019-06-04

Applicant: Amazon Technologies, Inc.

Inventor： Viacheslav Klimkov , Thomas Renaud Drugman , Alexander Galkin , Srikanth Ronanki

IPC: G10L13/00 , G10L25/78 , G10L13/027 , G10L15/16 , G10L15/187 , G06F16/38 , G06N3/08 , G06N20/20 , G06F17/18 , G06N3/04 , G10L13/04 , G10L13/033 , G10L13/07

Abstract: Audio data from a first, source speaker is received and processed to determine linguistic units and vocal characteristics corresponding to those linguistic units. The linguistic units may either be determined from received text data or may be determined from the audio data using automatic speech recognition. A model is trained using training data from a second, target speaker. The trained model concatenates the linguistic units with the vocal characteristics to produce output speech that has the “voice” of the target speaker and the vocal characteristics of the source speaker.

5.

发明授权
Personalized batch and streaming speech-to-text transcription of audio 有权

公开(公告)号：US12198681B1

公开(公告)日：2025-01-14

申请号：US17937297

申请日：2022-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Monica Lakshmi Sunkara , Srikanth Ronanki , Sravan Babu Bodapati , Jeffrey John Farris , Katrin Kirchhoff , Vivek Govindan , Yide Zou , Mohit Narendra Gupta , Silviu Mihai Burz

IPC: G10L15/16 , G10L15/30 , G10L25/51

Abstract: Techniques for personalized batch and streaming speech-to-text transcription of audio reduce the error rate of automatic speech recognition (ASR) systems in transcribing rare and out-of-vocabulary words. The techniques achieve personalization of connectionist temporal classification (CT) models by using adaptive boosting to perform biasing at the level of sub-words. In addition to boosting, the techniques encompass a phone alignment network to bias sub-word predictions towards rare long-tail words and out-of-vocabulary words. A technical benefit of the techniques is that the accuracy of speech-to-text transcription of rare and out-of-vocabulary words in a custom vocabulary by automatic speech recognition (ASR) system can be improved without having to train the ASR system on the custom vocabulary. Instead, the techniques allow the same ASR system trained on a base vocabulary to realize the accuracy improvements for different custom vocabularies spanning different domains.

6.

发明公开
TEXT-TO-SPEECH (TTS) PROCESSING 审中-公开

公开(公告)号：US20240296827A1

公开(公告)日：2024-09-05

申请号：US18664461

申请日：2024-05-15

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/10 , G10L13/06 , G10L25/18

CPC classification number: G10L13/10 , G10L13/06 , G10L25/18

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

7.

发明授权
Text-to-speech (TTS) processing 有权

公开(公告)号：US11735162B2

公开(公告)日：2023-08-22

申请号：US17882691

申请日：2022-08-08

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/10 , G10L25/18 , G10L13/06

CPC classification number: G10L13/10 , G10L13/06 , G10L25/18

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

8.

发明授权
Multimodal based punctuation and/or casing prediction 有权

公开(公告)号：US11580965B1

公开(公告)日：2023-02-14

申请号：US16938783

申请日：2020-07-24

Applicant: Amazon Technologies, Inc.

Inventor： Monica Lakshmi Sunkara , Srikanth Ronanki , Dhanush Bekal Kannangola , Sravan Babu Bodapati , Katrin Kirchhoff

IPC: G10L15/19 , G06N3/049 , G10L15/26 , G10L15/06

Abstract: Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.

9.

发明申请
REDACTING PORTIONS OF TEXT TRANSCRIPTIONS GENERATED FROM INVERSE TEXT NORMALIZATION 有权

公开(公告)号：US20250086380A1

公开(公告)日：2025-03-13

申请号：US18957409

申请日：2024-11-22

Applicant: Amazon Technologies, Inc.

Inventor： Monica Lakshmi Sunkara , Deepthi Devaiah Devanira , Chaitanya Shivade , Sravan Babu Bodapati , Katrin Kirchhoff , Srikanth Ronanki

IPC: G06F40/166 , G06F21/62 , G06F40/279 , G10L15/16 , G10L15/22

Abstract: Portions of text data generated from inverse text normalization may be redacted. Text data for redaction may be obtained. One or more inverse text normalization models may be applied to the text data to generate normalized text data. A machine learning model, trained to recognize text for redaction, may be applied to identify portions of the normalized text data for redaction. The identified portions may be redacted and the redacted normalized text provided to a destination.

10.

发明公开
TEXT-TO-SPEECH (TTS) PROCESSING 审中-公开

公开(公告)号：US20240013770A1

公开(公告)日：2024-01-11

申请号：US18206301

申请日：2023-06-06

Applicant: Amazon Technologies, Inc.

Inventor： Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote

IPC: G10L13/047

CPC classification number: G10L13/047

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification