-
1.
公开(公告)号:US20250139188A1
公开(公告)日:2025-05-01
申请号:US18889622
申请日:2024-09-19
Inventor: Jeong Heo , Oh Woog Kwon , Jihee Ryu , Young-Ae Seo , Jin SEONG , Jong Hun Shin , Ki Young Lee , Yo Han Lee , Soojong Lim
IPC: G06F17/11 , G06F40/205 , G06F40/40 , G06V10/86
Abstract: A method for problem inference based on multi-modal generative artificial intelligence includes receiving question information including an image and text, generating formal languages by parsing the image and text of the question information, respectively, based on a pre-constructed problem solving template, generating text-based intermediate inference information for the question information by inputting the generated formal language to a formal language inference unit, generating image-based inference information by inputting the text-based intermediate inference information, the text included in the question information (hereinafter referred to as “text question information”), and the image included in the question information (hereinafter referred to as “image question information”) to a multi-modal image generation model, and generating text-based inference information by inputting the text-based intermediate inference information, the image-based inference information, and the text question information to a multi-modal text generation model.
-
公开(公告)号:US20240176959A1
公开(公告)日:2024-05-30
申请号:US18200778
申请日:2023-05-23
Inventor: Jeong Heo , Young-Ae SEO , Jin SEONG , Jong Hun SHIN , Ki Young Lee , Soojong LIM , Young Kil Kim , Jihee Ryu
Abstract: Provided is a method of generating a language model using crossmodal information. The method includes: receiving language-based first modality information and non-language-based second modality information; converting the first modality information into a first byte sequence; converting the second modality information into a second byte sequence; converting the first and second byte sequences into a first embedding vector and a second embedding vector by applying an embedding technique for each modality; generating semantic association information between first and second modality information by inputting the first and second embedding vectors to a crossmodal transformer; and learning the language model by setting the generated semantic association information as training data.
-