-
公开(公告)号:US11531846B1
公开(公告)日:2022-12-20
申请号:US16587471
申请日:2019-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Sravan Babu Bodapati , Rishita Rajal Anubhai , Pu Paul Zhao , Katrin Kirchhoff
Abstract: Techniques for extending sensitive data tagging without reannotating training data are described. A method for extending sensitive data tagging without reannotating training data may include hosting a plurality of models at a model endpoint in a machine learning service, each model trained to identify a different sensitive data type in a transcript of content, adding a new model to the model endpoint, the new model trained to identify a new sensitive data entity in the transcript of content, identifying sensitive entities in the transcript by each of the plurality of models and the new model, merging inference responses generated by each of the plurality of models and the new model using at least one inference policy, and returning a merged inference response identifying a plurality of sensitive entities in the transcript.
-
公开(公告)号:US10777186B1
公开(公告)日:2020-09-15
申请号:US16190047
申请日:2018-11-13
Applicant: Amazon Technologies, Inc.
Inventor: Stefano Stefani , Pramod Gurunath , Ashish Singh , Katrin Kirchoff , Deepikaa Suresh , Varun Sembium Varadarajan , Vasanth Philomin , Vikram Sathyanarayana Anbazhagan , Pu Paul Zhao , Vijit Gupta , Ruoyu Huang
Abstract: Techniques for streaming real-time automated speech recognition (ASR) are described. A user can stream audio data to a frontend service of the ASR service. The frontend service can establish a bi-directional connection to an audio decoder host to perform ASR on the data stream. The audio decoder host may include a streaming ASR engine which can analyze chunks of the audio data stream using an acoustic model to divide the audio data into words, and a language model to identify sentences made of the words spoken in the audio file. The acoustic model can be trained using short audio sentence data (e.g., on the order of 30 seconds to a few minutes), enabling the transcription service to accurately transcribe short chunks of audio data. The results are then punctuated and normalized. The resulting transcript is then streamed back to the user over the bi-directional connection.
-
公开(公告)号:US12223259B1
公开(公告)日:2025-02-11
申请号:US16587800
申请日:2019-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Varun Sembium Varadarajan , Sravan Babu Bodapati , Deepthi Devaiah Devanira , Pu Paul Zhao , Katrin Kirchhoff , Yue Yang
IPC: G06F40/166 , G06F18/214 , G06F21/62 , G06F40/279 , G06F40/30 , G06N20/00
Abstract: Techniques for managing access to sensitive data in transcriptions are described. A method for managing access to sensitive data in transcriptions may include receiving a request to generate a redacted transcript of content, obtaining a transcript of the content, sending at least a portion of the transcript to a model endpoint to identify sensitive entities in the transcript, receiving an inference response identifying one or more sensitive entities in the transcript, and generating the redacted transcript based at least one the transcript and the inference response.
-
-