-
公开(公告)号:US12182498B1
公开(公告)日:2024-12-31
申请号:US17810302
申请日:2022-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Monica Lakshmi Sunkara , Deepthi Devaiah Devanira , Chaitanya Shivade , Sravan Babu Bodapati , Katrin Kirchhoff , Srikanth Ronanki
IPC: G06F40/166 , G06F21/62 , G06F40/279 , G10L15/16 , G10L15/22
Abstract: Portions of text data generated from inverse text normalization may be redacted. Text data for redaction may be obtained. One or more inverse text normalization models may be applied to the text data to generate normalized text data. A machine learning model, trained to recognize text for redaction, may be applied to identify portions of the normalized text data for redaction. The identified portions may be redacted and the redacted normalized text provided to a destination.
-
公开(公告)号:US12272350B2
公开(公告)日:2025-04-08
申请号:US18664461
申请日:2024-05-15
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11990118B2
公开(公告)日:2024-05-21
申请号:US18206301
申请日:2023-06-06
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11410684B1
公开(公告)日:2022-08-09
申请号:US16430894
申请日:2019-06-04
Applicant: Amazon Technologies, Inc.
Inventor: Viacheslav Klimkov , Thomas Renaud Drugman , Alexander Galkin , Srikanth Ronanki
IPC: G10L13/00 , G10L25/78 , G10L13/027 , G10L15/16 , G10L15/187 , G06F16/38 , G06N3/08 , G06N20/20 , G06F17/18 , G06N3/04 , G10L13/04 , G10L13/033 , G10L13/07
Abstract: Audio data from a first, source speaker is received and processed to determine linguistic units and vocal characteristics corresponding to those linguistic units. The linguistic units may either be determined from received text data or may be determined from the audio data using automatic speech recognition. A model is trained using training data from a second, target speaker. The trained model concatenates the linguistic units with the vocal characteristics to produce output speech that has the “voice” of the target speaker and the vocal characteristics of the source speaker.
-
公开(公告)号:US12198681B1
公开(公告)日:2025-01-14
申请号:US17937297
申请日:2022-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Monica Lakshmi Sunkara , Srikanth Ronanki , Sravan Babu Bodapati , Jeffrey John Farris , Katrin Kirchhoff , Vivek Govindan , Yide Zou , Mohit Narendra Gupta , Silviu Mihai Burz
Abstract: Techniques for personalized batch and streaming speech-to-text transcription of audio reduce the error rate of automatic speech recognition (ASR) systems in transcribing rare and out-of-vocabulary words. The techniques achieve personalization of connectionist temporal classification (CT) models by using adaptive boosting to perform biasing at the level of sub-words. In addition to boosting, the techniques encompass a phone alignment network to bias sub-word predictions towards rare long-tail words and out-of-vocabulary words. A technical benefit of the techniques is that the accuracy of speech-to-text transcription of rare and out-of-vocabulary words in a custom vocabulary by automatic speech recognition (ASR) system can be improved without having to train the ASR system on the custom vocabulary. Instead, the techniques allow the same ASR system trained on a base vocabulary to realize the accuracy improvements for different custom vocabularies spanning different domains.
-
公开(公告)号:US20240296827A1
公开(公告)日:2024-09-05
申请号:US18664461
申请日:2024-05-15
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11735162B2
公开(公告)日:2023-08-22
申请号:US17882691
申请日:2022-08-08
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
公开(公告)号:US11580965B1
公开(公告)日:2023-02-14
申请号:US16938783
申请日:2020-07-24
Applicant: Amazon Technologies, Inc.
Inventor: Monica Lakshmi Sunkara , Srikanth Ronanki , Dhanush Bekal Kannangola , Sravan Babu Bodapati , Katrin Kirchhoff
Abstract: Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.
-
公开(公告)号:US20250086380A1
公开(公告)日:2025-03-13
申请号:US18957409
申请日:2024-11-22
Applicant: Amazon Technologies, Inc.
Inventor: Monica Lakshmi Sunkara , Deepthi Devaiah Devanira , Chaitanya Shivade , Sravan Babu Bodapati , Katrin Kirchhoff , Srikanth Ronanki
IPC: G06F40/166 , G06F21/62 , G06F40/279 , G10L15/16 , G10L15/22
Abstract: Portions of text data generated from inverse text normalization may be redacted. Text data for redaction may be obtained. One or more inverse text normalization models may be applied to the text data to generate normalized text data. A machine learning model, trained to recognize text for redaction, may be applied to identify portions of the normalized text data for redaction. The identified portions may be redacted and the redacted normalized text provided to a destination.
-
公开(公告)号:US20240013770A1
公开(公告)日:2024-01-11
申请号:US18206301
申请日:2023-06-06
Applicant: Amazon Technologies, Inc.
Inventor: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
IPC: G10L13/047
CPC classification number: G10L13/047
Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
-
-
-
-
-
-
-
-