Patent search ap:("Nvidia Corporation") AND inv:"Yeongho Seol" Page 1

1.

发明申请
SYNTHESIZING VIDEO FROM AUDIO USING ONE OR MORE NEURAL NETWORKS 有权

公开(公告)号：US20230035306A1

公开(公告)日：2023-02-02

申请号：US17382027

申请日：2021-07-21

Applicant: Nvidia Corporation

Inventor： Ming-Yu Liu , Koki Nagano , Yeongho Seol , Jose Rafael Valle Gomes da Costa , Jaewoo Seo , Ting-Chun Wang , Arun Mallya , Sameh Khamis , Wei Ping , Rohan Badlani , Kevin Jonathan Shih , Bryan Catanzaro , Simon Yuen , Jan Kautz

IPC: G06T13/40 , H04N19/597 , G06N3/04 , G10L13/04 , G06T9/00 , G06T17/10 , G06T13/20

Abstract: Apparatuses, systems, and techniques are presented to generate media content. In at least one embodiment, a first neural network is used to generate first video information based, at least in part, upon voice information corresponding to one or more users, and a second neural network is used to generate second video information corresponding to the one or more users based, at least in part, upon the first video information and one or more images corresponding to the one or more users

2.

发明公开
AUDIO-DRIVEN FACIAL ANIMATION WITH EMOTION SUPPORT USING MACHINE LEARNING 审中-公开

公开(公告)号：US20240013462A1

公开(公告)日：2024-01-11

申请号：US17859615

申请日：2022-07-07

Applicant: Nvidia Corporation

Inventor： Yeongho Seol , Simon Yuen , Dmitry Aleksandrovich Korobchenko , Mingquan Zhou , Ronan Browne , Wonmin Byeon

IPC: G06T13/20 , G06T13/40 , G06T17/20 , G10L25/63 , G10L15/16

CPC classification number: G06T13/205 , G06T13/40 , G06T17/20 , G10L25/63 , G10L15/16

Abstract: A deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input, which is accurate for an emotional state of the character. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can be provided with emotion and/or style vectors that indicate information to be used in generating realistic animation for input speech, as may relate to one or more emotions to be exhibited by the character, a relative weighting of those emotions, and any style or adjustments to be made to how the character expresses that emotional state. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.

3.

发明授权
Three-dimensional model recovery from two-dimensional images 有权

公开(公告)号：US11734890B2

公开(公告)日：2023-08-22

申请号：US17175792

申请日：2021-02-15

Applicant: NVIDIA Corporation

Inventor： Samuli Matias Laine , Janne Johannes Hellsten , Tero Tapani Karras , Yeongho Seol , Jaakko T. Lehtinen , Timo Oskari Aila

IPC: G06T17/20 , G06T15/50 , G06T19/20 , G06T15/04 , G06T7/00 , G06N3/08 , G06N3/04

CPC classification number: G06T17/205 , G06T7/97 , G06T15/04 , G06T15/50 , G06T15/503 , G06T19/20 , G06N3/04 , G06N3/08 , G06T2207/20081 , G06T2207/20084 , G06T2219/2012

Abstract: A three-dimensional (3D) model of an object is recovered from two-dimensional (2D) images of the object. Each image in the set of 2D images includes the object captured from a different camera position and deformations of a base mesh that defines the 3D model may be computed corresponding to each image. The 3D model may also include a texture map that represents the lighting and material properties of the 3D model. Recovery of the 3D model relies on analytic antialiasing to provide a link between pixel colors in the 2D images and geometry of the 3D model. A modular differentiable renderer design yields high performance by leveraging existing, highly optimized hardware graphics pipelines to reconstruct the 3D model. The differential renderer renders images of the 3D model and differences between the rendered images and reference images are propagated backwards through the rendering pipeline to iteratively adjust the 3D model.

4.

发明申请
IDENTIFYING FACIAL LANDMARK LOCATIONS FOR AI SYSTEMS AND APPLICATIONS 有权

公开(公告)号：US20250118039A1

公开(公告)日：2025-04-10

申请号：US18483697

申请日：2023-10-10

Applicant: NVIDIA Corporation

Inventor： Yeongho Seol

IPC: G06T19/20 , G06T7/70 , G06T13/40 , G06V10/25 , G06V40/16

Abstract: In various examples, landmark identification and retargeting for AI systems and applications is described herein. Systems and methods are disclosed that use a first three-dimensional (3D) face (e.g., a morphable model mesh) that is already associated with locations of facial landmarks to determine locations of corresponding facial landmarks on a second 3D face (e.g., a target face mesh). To determine the locations, one or more iterations of transformation processes and/or fitting processes may be performed on the first 3D face in order to morph the landmarks of the first 3D face to align with second landmarks on the second 3D face. After performing the iteration(s) of the transformation processes and/or the fitting processes, closest locations (e.g., vertices) on the second 3D face from the landmark locations (e.g., vertices) on the first 3D face are identified and used as the locations of the corresponding facial landmarks on the second 3D face.

5.

发明申请
AUDIO-DRIVEN FACIAL ANIMATION USING MACHINE LEARNING 有权

公开(公告)号：US20250061634A1

公开(公告)日：2025-02-20

申请号：US18457251

申请日：2023-08-28

Applicant: Nvidia Corporation

Inventor： Zhengyu Huang , Rui Zhang , Tao Li , Yingying Zhong , Weihua Zhang , Junjie Lai , Yeongho Seol , Dmitry Korobchenko , Simon Yuen

IPC: G06T13/20 , G06T13/40 , G10L15/16

Abstract: Systems and methods of the present disclosure include animating virtual avatars or agents according to input audio and one or more selected or determined emotions and/or styles. For example, a deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can use a transformer-based audio encoder with locked parameters to train an associated decoder using a weighted feature vector. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.

6.

发明申请
FACIAL ANIMATION USING EMOTIONS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS 有权

公开(公告)号：US20240412440A1

公开(公告)日：2024-12-12

申请号：US18329831

申请日：2023-06-06

Applicant: NVIDIA Corporation

Inventor： Rui Zhang , Zhengyu Huang , Lance Li , Weihua Zhang , Yingying Zhong , Junjie Lai , Yeongho Seol , Dmitry Korobchenko

IPC: G06T13/40 , G06N3/045 , G06N3/094 , G10L25/30 , G10L25/63

Abstract: In various examples, techniques are described for animating characters by decoupling portions of a face from other portions of the face. Systems and methods are disclosed that use one or more neural networks to generate high-fidelity facial animation using inputted audio data. In order to generate the high-fidelity facial animations, the systems and methods may decouple effects of implicit emotional states from effects of audio on the facial animations during training of the neural network(s). For instance, the training may cause the audio to drive the lower face animations while the implicit emotional states drive the upper face animations. In some examples, in order to encourage more expressive expressions, adversarial training is further used to learn a discriminator that predicts if generated emotional states are from real distribution.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification