Abstract:
A method and apparatus for generating/converting a digital hologram is provided. The method for generating the digital hologram includes: clustering points of a 3D object to a plurality of clusters according to a distance to a screen; generating diffraction patterns in a unit of a cluster; and generating a fringe pattern by overlapping the diffraction patterns with one another. Accordingly, calculation complexity can be reduced and thus it is possible to generate a digital hologram at high speed. Since a range/size of a cluster which has a trade-off relationship with an image quality of a digital hologram is adjustable, it is possible to generate a customized flexible digital hologram.
Abstract:
There is provided a speech synthesis system and method with an adjustable utterance length. The speech synthesis method according to an embodiment predicts a duration of each phoneme corresponding to a speech mask from the speech mask and a text to be synthesized with the speech mask, encodes the text to be synthesized and extracts a text sequence which is expressed by feature information of the text, generates a speech frame sequence by regulating a length of each phoneme of the text sequence according to the predicted duration of each phoneme corresponding to the speech mask, and synthesizes a speech from the generated speech frame sequence. Accordingly, a length of a speech to be synthesized can be freely regulated as a user desires by regulating a length of a speech mask.
Abstract:
There is provided a training method of a multi-task integrated deep learning model. A multi-task integrated deep learning model training method according to an embodiment may generate training data for a plurality of visual intelligence tasks from visual data in a batch, and may train a multi-task integrated deep learning model which performs a plurality of visual intelligence tasks by using the generated training data. Accordingly, training data for training an integrated deep learning model which performs various visual intelligence tasks is generated in a batch through multi-data conversion kernels, so that appropriate training data for performing multiple tasks may be easily obtained and effective training of a multi-task integrated deep learning model is possible.
Abstract:
There are provided a method and a system for acquiring visual explanation information independent of the purpose, type, and structure of a visual intelligence model. The visual explanation information acquisition system of the visual intelligence model according to an embodiment may input N transformed images which are generated by diversifying an input image to a deep learning-based visual intelligence model and may acquire outputted results, may generate attributes of the visual intelligence model from the acquired results, may derive, from losses of the visual intelligence model which are calculated from the generated attributes, basic data for generating a visual explanation map for visually explaining a result derivation rationale of the visual intelligence model, and may generate a visual explanation map from the derived basic data. Accordingly, visual explanation information may be acquired from various visual intelligence models through one system independently of the purpose, type, and structure of the visual intelligence model.
Abstract:
An audio segmentation method based on an attention mechanism is provided. The audio segmentation method according to an embodiment obtains a mapping relationship between an “inputted text” and an “audio spectrum feature vector for generating an audio signal”, the audio spectrum feature vector being automatically synthesized by using the inputted text, and segments an inputted audio signal by using the mapping relationship. Accordingly, high quality can be guaranteed and the effort, time, and cost can be noticeably reduced through audio segmentation utilizing the attention mechanism.
Abstract:
The present disclosure relates to a method and system for controlling loudness of an audio based on signal analysis and deep learning. The method includes analyzing an audio characteristic in a frame level based on signal analysis, analyzing the audio characteristic in the frame level based on learning, and controlling loudness of the audio in the frame level, by combining the analysis results. Accordingly, reliability of audio characteristic analysis can be enhanced and audio loudness can be optimally controlled.