IDENTIFYING FACIAL LANDMARK LOCATIONS FOR AI SYSTEMS AND APPLICATIONS

    公开(公告)号:US20250118039A1

    公开(公告)日:2025-04-10

    申请号:US18483697

    申请日:2023-10-10

    Inventor: Yeongho Seol

    Abstract: In various examples, landmark identification and retargeting for AI systems and applications is described herein. Systems and methods are disclosed that use a first three-dimensional (3D) face (e.g., a morphable model mesh) that is already associated with locations of facial landmarks to determine locations of corresponding facial landmarks on a second 3D face (e.g., a target face mesh). To determine the locations, one or more iterations of transformation processes and/or fitting processes may be performed on the first 3D face in order to morph the landmarks of the first 3D face to align with second landmarks on the second 3D face. After performing the iteration(s) of the transformation processes and/or the fitting processes, closest locations (e.g., vertices) on the second 3D face from the landmark locations (e.g., vertices) on the first 3D face are identified and used as the locations of the corresponding facial landmarks on the second 3D face.

    AUDIO-DRIVEN FACIAL ANIMATION USING MACHINE LEARNING

    公开(公告)号:US20250061634A1

    公开(公告)日:2025-02-20

    申请号:US18457251

    申请日:2023-08-28

    Abstract: Systems and methods of the present disclosure include animating virtual avatars or agents according to input audio and one or more selected or determined emotions and/or styles. For example, a deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can use a transformer-based audio encoder with locked parameters to train an associated decoder using a weighted feature vector. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.

Patent Agency Ranking