SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING A SIGN LANGUAGE VIDEO WITH AN INPUT SPEECH USING A MACHINE LEARNING MODEL

Invention Publication

US20230290371A1 SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING A SIGN LANGUAGE VIDEO WITH AN INPUT SPEECH USING A MACHINE LEARNING MODEL 审中-公开

Please log in to see more content

Patent Title: SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING A SIGN LANGUAGE VIDEO WITH AN INPUT SPEECH USING A MACHINE LEARNING MODEL
Application No.: US18120374

Application Date: 2023-03-11
Publication No.: US20230290371A1

Publication Date: 2023-09-14
Inventor: C.V. Jawahar , Parul Kapoor , Sindhu B. Hegde , Rudrabha Mukhopadhyay , Vinay Namboodiri
Applicant: International Institute of Information Technology, Hyderabad
Applicant Address: IN Hyderabad
Assignee: International Institute of Information Technology, Hyderabad
Current Assignee: International Institute of Information Technology, Hyderabad
Current Assignee Address: IN Hyderabad
Priority: IN 2241013436 2022.03.11
Main IPC: G10L21/10
IPC: G10L21/10 ; G10L25/30 ; G06T13/40

SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING A SIGN LANGUAGE VIDEO WITH AN INPUT SPEECH USING A MACHINE LEARNING MODEL

Abstract:

Embodiments herein provide a system and method for automatically generating a sign language video from an input speech using the machine learning model. The method includes (i) extracting a plurality of spectrograms of an input speech by (a) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (b) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series, (ii) generating a plurality of pose sequences for a current time step of the plurality of spectrograms using a first machine learning model, and (iii) automatically generating, using a discriminator of a second machine learning model, a sign language video for the input speech using the plurality of pose sequences and the plurality of spectrograms when the plurality of pose sequences are matched with corresponding the plurality of spectrograms that are extracted.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L21/00	为了改变语音或声音信号的质量或其可识度而处理语音或声音信号，以产生另一种可听的或非可听的信号，例如视觉信号或触觉信号（G10L19/00优先）
G10L21/06	.将语音转换成非可听表达形式，例如语音可视化、触觉辅助的语音处理（G10L15/26优先）
G10L21/10	..转换成可视信息