Abstract:
There is provided an emotion prediction method based on virtual facial expression image augmentation. The emotion prediction method may acquire a user facial image, may extract a facial expression feature from the acquired user facial image, and may predict a user emotion from the extracted facial expression feature. The emotion prediction method may extract the facial expression feature by using a facial expression recognition network, the facial expression recognition network being an AI model that is trained to receive a user facial image and to extract a facial expression feature. The facial expression recognition network is retrained with virtual facial images which are augmented from a facial image that causes a failure in emotion recognition. Accordingly, by augmenting features of a facial expression image that causes a failure in prediction through error feedback, facial expression recognition performance can be enhanced.
Abstract:
A method for separating audio sources and an audio system using the same are provided. The method introduces the concept of a residual signal to separate a mixed audio signal into audio sources, and separates an audio signal corresponding to at least two of the audio sources as a residual signal and processes the audio signal separately. Therefore, audio separation performance can be improved. In addition, the method re-separates a separated residual signal and adds the separated residual signals to corresponding audio sources. Therefore, audio sources can be separated more safely.
Abstract:
There is provided a depth estimation method for a small baseline-stereo camera through LiDAR sensor fusion. A depth map estimation method according to an embodiment may estimate a high-resolution depth map from a small baseline-stereo image based on deep learning, by using transfer learning from a deep learning network that is trained to estimate a depth map from a wide baseline-stereo image. Accordingly, in a device which has a small baseline-stereo camera installed therein due to structural constraints, such as a smartphone, a wearable AR/VR device, a drone, 3D image quality can be enhanced. In addition, according to embodiments, pseudo-LiDAR data may be generated by using a depth map estimated from a small baseline-stereo image, and may be used for replacing or reinforcing LiDAR data.
Abstract:
A method and an apparatus for detecting a lane is provided. The lane detection apparatus according to an embodiment includes: an acquisition unit configured to acquire a front image of a vehicle; and a processor configured to input the image acquired through the acquisition unit to an AI model, and to detect information of a lane on a road, and the AI model is trained to detect lane information that is expressed in a plane form from an input image. Accordingly, data imbalance between a lane area and a non-lane area can be solved by using the AI model which learns/predicts lane information that is expressed in a plane form, not in a segment form such as a straight line or curved line.
Abstract:
There are provided an apparatus and a method for reconstructing a 3D human object in real time based on a monocular color image. A 3D human object reconstruction apparatus according to an embodiment extracts a pixel-aligned feature from a monocular image, extracts a ray-invariant feature from the pixel-aligned feature, generates encoded position information by encoding position information of a point, predicts a SD of a point from the ray-invariant feature and the encoded position information which are extracted, and reconstructs a 3D human object by using the predicted SD. Accordingly, the ray-invariant feature extracted from the pixel-aligned feature, and the encoded position information are used, so that an amount of computation for predicting SDs of points of a 3D space can be noticeably reduced and a speed can be remarkably enhanced.
Abstract:
There are provided an apparatus and a method for reconstructing a 3D human object based on a monocular image through depth image-based implicit function learning. A 3D human object reconstruction method according to an embodiment includes: predicting a double-sided orthographic depth map from a front perspective color image of a human object; predicting a signed distance (SD) regarding points on a 3D space from the predicted double-sided orthographic depth map; and reconstructing a 3D human object by using the predicted SD. Accordingly, a human object and details can be naturally reconstructed with respect to not only an area visible through a front perspective color image of the human object but also an invisible area.
Abstract:
The present invention relates to a method for selecting an appropriate mode when performing a new broadcast, such as a 3D stereo broadcast, a UHDTV broadcast, and a multi-view broadcast, among others, while maintaining compatibility with existing broadcasting channels in an MPEG-2-TS format for transmitting and receiving digital TV, and to a method for recognizing a descriptor. To this end, the present invention suggests providing the descriptor which is related to synthesizing left and right images using the type of stream, existence of the descriptor, and a frame-compatible mode flag.