-
公开(公告)号:US20240282345A1
公开(公告)日:2024-08-22
申请号:US18413991
申请日:2024-01-16
Inventor: Yoonhyung KIM , Byung Ok KANG , Hoon CHUNG
Abstract: Disclosed herein are an apparatus and method for audio-video sampling frequency ratio unification, including memory configured to store at least one program, and a processor configured to execute the program, wherein the program is configured to perform receiving an audio signal and a video signal, adjusting a ratio of a sampling frequency of the audio signal to a sampling frequency of the video signal so that the sampling frequency ratio is constant based on a deep learning network, and outputting an adjusted audio signal and the video signal.
-
2.
公开(公告)号:US20240105166A1
公开(公告)日:2024-03-28
申请号:US18350111
申请日:2023-07-11
Inventor: Hoon CHUNG , Byung Ok KANG , Yoonhyung KIM
IPC: G10L15/16 , G10L15/06 , G10L15/065
CPC classification number: G10L15/16 , G10L15/063 , G10L15/065
Abstract: Provided is a self-supervised learning method based on permutation invariant cross entropy. A self-supervised learning method based on permutation invariant cross entropy performed by an electronic device includes: defining a cross entropy loss function for pre-training of an end-to-end speech recognition model; configuring non-transcription speech corpus data composed only of speech as input data of the cross entropy loss function; setting all permutations of classes included in the non-transcription speech corpus data as an output target and calculating cross entropy losses for each class; and determining a minimum cross entropy loss among the calculated cross entropy losses for each class as a final loss.
-