-
公开(公告)号:US11551695B1
公开(公告)日:2023-01-10
申请号:US15931455
申请日:2020-05-13
Applicant: Amazon Technologies, Inc.
Inventor: Vivek Govindan , Varun Sembium Varadarajan , Christian Egon Berkhoff Dossow , Himalay Mohanlal Joriwal , Sai Madhuri Bhavirisetty , Abhinav Kumar , Orestis Lykouropoulos , Akshay Nalwaya , Rahul Gupta , Sravan Babu Bodapati , Liangwei Guo , Julian E. S. Salazar , Yibin Wang , K P N V D S Siva Rama , Calvin Xuan Li , Mohit Narendra Gupta , Asem Rustum , Katrin Kirchhoff , Pu Zhao
Abstract: A transcription service may receive a request from a developer to build a custom speech-to-text model for a specific domain of speech. The custom speech-to-text model for the specific domain may replace a general speech-to-text model or add to a set of one or more speech-to-text models available for transcribing speech. The transcription service may receive a training data and instructions representing tasks. The transcription service may determine respective schedules for executing the instructions based at least in part on dependencies between the tasks. The transcription service may execute the instructions according to the respective schedules to train a speech-to-text model for a specific domain using the training data set. The transcription service may deploy the trained speech-to-text model as part of a network-accessible service for an end user to convert audio in the specific domain into texts.
-
公开(公告)号:US20240331821A1
公开(公告)日:2024-10-03
申请号:US18194350
申请日:2023-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Vijit Gupta , Matthew Chih-Hui Chiou , Amiya Kishor Chakraborty , Anuroop Arora , Varun Sembium Varadarajan , Sarthak Handa , Amit Vithal Sawant , Glen Herschel Carpenter , Jesse Deng , Mohit Narendra Gupta , Rohil Bhattarai , Samuel Benjamin Schiff , Shane Michael McGookey , Tianze Zhang
Abstract: Systems and methods for performing medical audio summarizing for medical conversations are disclosed. An audio file and meta data for a medical conversation are provided to a medical audio summarization system. A transcription machine learning model is used by the medical audio summarization system to generate a transcript and a natural language processing service of the medical audio summarization system is used to generate a summary of the transcript. The natural language processing service may include at least four machine learning models that identify medical entities in the transcript, identify speaker roles in the transcript, determine sections of the transcript corresponding to the summary, and extract or abstract phrases for the summary. The identified medical entities and speaker roles, determined sections, and extracted or abstracted phrases may then be used to generate the summary.
-
公开(公告)号:US12198681B1
公开(公告)日:2025-01-14
申请号:US17937297
申请日:2022-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Monica Lakshmi Sunkara , Srikanth Ronanki , Sravan Babu Bodapati , Jeffrey John Farris , Katrin Kirchhoff , Vivek Govindan , Yide Zou , Mohit Narendra Gupta , Silviu Mihai Burz
Abstract: Techniques for personalized batch and streaming speech-to-text transcription of audio reduce the error rate of automatic speech recognition (ASR) systems in transcribing rare and out-of-vocabulary words. The techniques achieve personalization of connectionist temporal classification (CT) models by using adaptive boosting to perform biasing at the level of sub-words. In addition to boosting, the techniques encompass a phone alignment network to bias sub-word predictions towards rare long-tail words and out-of-vocabulary words. A technical benefit of the techniques is that the accuracy of speech-to-text transcription of rare and out-of-vocabulary words in a custom vocabulary by automatic speech recognition (ASR) system can be improved without having to train the ASR system on the custom vocabulary. Instead, the techniques allow the same ASR system trained on a base vocabulary to realize the accuracy improvements for different custom vocabularies spanning different domains.
-
-