-
公开(公告)号:US20250149031A1
公开(公告)日:2025-05-08
申请号:US18816659
申请日:2024-08-27
Applicant: Samsung Electronics Co., Ltd.
Inventor: Aditya Jajodia , Akash Sahoo , Patrick Hegarty , Divya Neelagiri , Vijendra Raj Apsingekar
IPC: G10L15/197 , G10L13/02 , G10L15/06
Abstract: A method includes identifying, using an automated speech recognition (ASR) system, at least one named entity hypothesis from at least one audio input. The method also can include providing, using the ASR system, the identified at least one named entity to a large language model (LLM). The method also can include generating a prompt using an automated prompt generator. The method also can include processing, using the LLM, the identified at least one named entity hypothesis and the prompt to generate updated named entity recognition data. The method also can include providing the updated named entity recognition data back to the ASR system.
-
2.
公开(公告)号:US20240056761A1
公开(公告)日:2024-02-15
申请号:US18335730
申请日:2023-06-15
Applicant: Samsung Electronics Co., Ltd.
Inventor: Vijendra Raj Apsingekar , Akash Sahoo , Anil S. Yadav , Sivakumar Balasubramanian
IPC: H04S7/00 , H04S3/00 , G06F3/16 , G10L19/008
CPC classification number: H04S7/304 , H04S3/008 , G06F3/165 , G10L19/008 , H04S2400/11
Abstract: A method includes obtaining video content and associated substantially mono audio content. The method also includes determining at least one of a position or a motion trajectory of each of one or more objects detected in the video content and classifying each of the one or more objects into one of multiple object classes. The method further includes separating audio streams within the audio content based on the video content. Each of the audio streams is associated with one of multiple audio sources. The method also includes classifying each of the audio sources into one of the object classes. In addition, the method includes, for each audio source classified into the same object class as one of the one or more objects, distributing the audio stream associated with that audio source into multiple audio channels based on at least one of the position or the motion trajectory of that object.
-