-
公开(公告)号:US20240029723A1
公开(公告)日:2024-01-25
申请号:US17937198
申请日:2022-09-30
Applicant: Samsung Electronics Co., Ltd.
Inventor: Sivakumar Balasubramanian , Gowtham Srinivasan , Srinivasa Rao Ponakala , Vijendra Raj Apsingekar , Anil Sunder Yadav
IPC: G10L15/197 , G10L15/06 , G10L15/22
CPC classification number: G10L15/197 , G10L15/063 , G10L15/22 , G10L2015/223
Abstract: A method comprises obtaining an audio input. The method also includes providing at least a portion of the audio input to a frame-level detector model. The method also includes obtaining a first output of the frame-level detector model including frame-level predictions associated with at least the portion of the audio input. The method also includes providing at least one chunked audio frame to a word-level verifier model. The method also includes obtaining a second output of the word-level verifier model including word-level probabilities associated with the at least one chunked audio frame. The method also includes instructing performance of automatic speech recognition on the audio input based on the word-level probabilities associated with the at least one chunked audio frame.
-
公开(公告)号:US12272357B2
公开(公告)日:2025-04-08
申请号:US17929280
申请日:2022-09-01
Applicant: Samsung Electronics Co., Ltd.
Inventor: Sivakumar Balasubramanian , Gowtham Srinivasan , Srinivasa Rao Ponakala , Vijendra Raj Apsingekar , Anil Sunder Yadav
Abstract: A method includes accessing, using at least one processor of an electronic device, a machine learning model. The machine learning model is a trained student model that is trained using audio samples in a plurality of accent types. The method also includes receiving, using the at least one processor, an audio input from an audio input device. The method further includes providing, using the at least one processor, the audio input to the trained student model. The method also includes receiving, using the at least one processor, an output from the trained student model including frame-level probabilities associated with the audio input. In addition, the method includes instructing, using the at least one processor, at least one action based on the frame-level probabilities associated with the audio input.
-
公开(公告)号:US12236939B2
公开(公告)日:2025-02-25
申请号:US17499072
申请日:2021-10-12
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Sivakumar Balasubramanian , Gowtham Srinivasan , Srinivasa Rao Ponakala , Anil Sunder Yadav , Aditya Jajodia
Abstract: A method of generating a trained trigger word detection model includes training an auxiliary model, based on an auxiliary task, to concentrate on one or more utterances and/or learn context of the one or more utterances using generic single word and/or phrase training data; and obtaining a trigger word detection model by retraining one or more final layers of the auxiliary model, which is weighted based on the auxiliary task, based on a trigger word detection task that detects one or more trigger words. The retraining uses training data specific to the one or more trigger words.
-
公开(公告)号:US20230368786A1
公开(公告)日:2023-11-16
申请号:US17929280
申请日:2022-09-01
Applicant: Samsung Electronics Co., Ltd.
Inventor: Sivakumar Balasubramanian , Gowtham Srinivasan , Srinivasa Rao Ponakala , Vijendra Raj Apsingekar , Anil Sunder Yadav
CPC classification number: G10L15/22 , G10L15/063 , G10L2015/223 , G10L2015/0631
Abstract: A method includes accessing, using at least one processor of an electronic device, a machine learning model. The machine learning model is a trained student model that is trained using audio samples in a plurality of accent types. The method also includes receiving, using the at least one processor, an audio input from an audio input device. The method further includes providing, using the at least one processor, the audio input to the trained student model. The method also includes receiving, using the at least one processor, an output from the trained student model including frame-level probabilities associated with the audio input. In addition, the method includes instructing, using the at least one processor, at least one action based on the frame-level probabilities associated with the audio input.
-
5.
公开(公告)号:US20240056761A1
公开(公告)日:2024-02-15
申请号:US18335730
申请日:2023-06-15
Applicant: Samsung Electronics Co., Ltd.
Inventor: Vijendra Raj Apsingekar , Akash Sahoo , Anil S. Yadav , Sivakumar Balasubramanian
IPC: H04S7/00 , H04S3/00 , G06F3/16 , G10L19/008
CPC classification number: H04S7/304 , H04S3/008 , G06F3/165 , G10L19/008 , H04S2400/11
Abstract: A method includes obtaining video content and associated substantially mono audio content. The method also includes determining at least one of a position or a motion trajectory of each of one or more objects detected in the video content and classifying each of the one or more objects into one of multiple object classes. The method further includes separating audio streams within the audio content based on the video content. Each of the audio streams is associated with one of multiple audio sources. The method also includes classifying each of the audio sources into one of the object classes. In addition, the method includes, for each audio source classified into the same object class as one of the one or more objects, distributing the audio stream associated with that audio source into multiple audio channels based on at least one of the position or the motion trajectory of that object.
-
-
-
-