Abstract:
Disclosed herein is a method for measuring the weight of a discrete entity, performed in a neural network model configured with multiple layers, the method including receiving data configured with the indices of discrete entities, converting the data into embedding vectors corresponding to respective indices through an embedding layer, generating a masked vector through element-wise multiplication between a mask vector and the embedding vector, calculating a loss using output based on the masked vector, and training the model based on the loss.
Abstract:
Provided is an apparatus and method for coding and decoding multi-object audio signals with various channels and providing backward compatibility with a conventional spatial audio coding (SAC) bitstream. The apparatus includes: an audio object coding unit for coding audio-object signals inputted to the coding apparatus based on a spatial cue and creating rendering information for the coded audio-object signals, where the rendering information provides a coding apparatus including spatial cue information for audio-object signals; channel information of the audio-object signals; and identification information of the audio-object signals, and used in coding and decoding of the audio signals.
Abstract:
Disclosed herein is a method for organizing and merging scene understanding information of an AI agent. The method includes acquiring an image of a first space, recognizing objects in the image, structuring information about the relationship between the object and information about the states of the objects in the form of a graph, and merging the structured information with information received from a nearby AI agent.
Abstract:
Disclosed herein is a data stream processing apparatus and method using query partitioning, which allow data stream processing apparatuses to perform partitioned processing/parallel processing on partitioned sub-queries. The proposed data stream processing apparatus using query partitioning receives a query from a user, partitions the query into a plurality of sub-queries, transmits the partitioned sub-queries to another data stream processing apparatus or a sub-query processing unit, integrates the results of the processing of sub-queries processed by the other data stream processing apparatus and the sub-query processing unit with each other, generates a response to the query, and transmits the generated response to the user.
Abstract:
Disclosed herein is a method for task planning for collaboration of artificial intelligence (AI) agents. The method includes generating a scene graph using an image acquired by an AI agent and a human instruction and generating a machine instruction set for objects in the scene graph, and the scene graph includes relevance information between each of the objects in the scene graph and the human instruction.
Abstract:
The present research relates to controlling rendering of multi-object or multi-channel audio signals. The present research provides a method and apparatus for controlling rendering of multi-object or multi-channel audio signals based on spatial cues in a process of decoding the multi-object or multi-channel audio signals. To achieve the purpose, the method suggested in the research controls rendering in a spatial cue domain in the process of decoding the multi-object or multi-channel audio signals.
Abstract:
Disclosed herein is an apparatus and method for scene graph generation. The apparatus may include a backbone network for extracting a first feature map from an input image, an encoder for extracting a second feature map that is based on a mask for the shape of an object within a bounding box using the first feature map and generating a third feature map by combining the first feature map and the second feature map, and a decoder for generating a scene graph by predicting the relationship between objects from the third feature map.
Abstract:
The present research relates to controlling rendering of multi-object or multi-channel audio signals. The present research provides a method and apparatus for controlling rendering of multi-object or multi-channel audio signals based on spatial cues in a process of decoding the multi-object or multi-channel audio signals. To achieve the purpose, the method suggested in the research controls rendering in a spatial cue domain in the process of decoding the multi-object or multi-channel audio signals.
Abstract:
Provided is a method and apparatus for generating a side information bitstream of a multi-object audio signal. The apparatus for generating a side information bitstream of a multi-object audio signal includes a spatial cue information input unit configured to receive spatial cue information generated in an encoder of the multi-object audio signal, a preset information input unit configured to receive preset information for the multi-object audio signal, and a side information bitstream generator configured to generate the side information bitstream based on the spatial cue information and the preset information. The side information bitstream includes a header region and a frame region, and the preset information is included in the frame region.
Abstract:
Provided are an object-based three dimensional (3-D) audio service system using preset audio scenes and a method thereof. The system and the method are suggested for enabling a user to easily and conveniently watch and listen an object based 3-D audio service by eliminating inconvenience that requires a user to control each of object audio signals of sound sources. The system includes: audio input means for inputting an audio signal; preset audio scene generating means for extracting object audio signals from the audio signal inputted through the audio input means and generating more than one of 3-D audio scene information by arranging the extracted object audio signals in a 3-D space and editing features of each object; and encoding means for encoding and multiplexing the audio signal and the 3-D audio scene information for each object audio signal.