SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING A SIGN LANGUAGE VIDEO WITH AN INPUT SPEECH USING A MACHINE LEARNING MODEL
Abstract:
Embodiments herein provide a system and method for automatically generating a sign language video from an input speech using the machine learning model. The method includes (i) extracting a plurality of spectrograms of an input speech by (a) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (b) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series, (ii) generating a plurality of pose sequences for a current time step of the plurality of spectrograms using a first machine learning model, and (iii) automatically generating, using a discriminator of a second machine learning model, a sign language video for the input speech using the plurality of pose sequences and the plurality of spectrograms when the plurality of pose sequences are matched with corresponding the plurality of spectrograms that are extracted.
Information query
Patent Agency Ranking
0/0