Abstract:
A method comprising: causing analysis of a portion of a visual scene; causing modification of a first sound object to modify a spatial extent of the first sound object in dependence upon the analysis of the portion of the visual scene corresponding to the first sound object; and causing rendering of the visual scene and the corresponding sound scene including of the modified first sound object with modified spatial extent.
Abstract:
An apparatus for controlling a controllable position/orientation of at least one audio source within an audio scene, the audio scene including the at least one audio source; a capture device, the apparatus including a processor configured to: receive a physical position/orientation of the at least one audio source relative to a capture device capture orientation; receive an earlier physical position/orientation of the at least one audio source relative to the capture device capture orientation; receive at least one control parameter; and control a controllable position/orientation of the at least one audio source, the controllable position being between the physical position/orientation of the at least one audio source relative to the capture device capture orientation and the earlier physical position/orientation of the at least one audio source relative to the capture device capture orientation and based on the control parameter.
Abstract:
The invention relates to a method, an apparatus and a computer program product for analyzing media content. The method comprises receiving media content; performing feature extraction of the media content at a plurality of convolution layers to produce a plurality of layer-specific feature maps; transmitting from the plurality of convolution layers a corresponding layer-specific feature map to a corresponding de-convolution layer of a plurality of de-convolution layers via a recurrent connection between the plurality of convolution layers and the plurality of de-convolution layers; and generating a reconstructed media content based on the plurality of feature maps.
Abstract:
A method for operating a computer graphic system, the method comprising: inputting a media content object (MCO) into a feature extractor comprising semantic abstraction levels; extracting feature maps from the MCO on each of the semantic layers; selecting at least a portion of the MCO to be analyzed; determining, based on the analysis of the feature maps from the portion of the MCO and the analysis of a previous state of a recognition unit, one or more feature maps selected from the feature maps of the semantic layers; determining a weight for each feature map; repeating the determining steps N times, each time processing, based on the analysis, each feature map by applying the corresponding weight; inputting the processed feature maps to the recognition unit; and analyzing a number of the processed feature maps until a prediction about the portion of the MCO is output.
Abstract:
A method, an apparatus and computer program code is provided. The method comprises: responding to user input by making at least one alteration to a recording of a real scene in a first image content item; determining at least one altered characteristic of the recording of the real scene; determining whether one or more further image content items, different from the first image content item, have a recording of a real scene comprising the at least one determined altered characteristic; and causing at least one further image content item, having a recording of a real scene comprising the at least one determined altered characteristic, to be indicated to a user.
Abstract:
A method, apparatus and computer program product are provided for extracting spatio-temporal features with the aid of sensor information. An exemplary method comprises receiving video data and auxiliary sensor data and associating the two with timestamp information. The method may also include segmenting an input data stream into stable segments and extracting temporal features from the associated video data. The method may further include extracting temporal features either form the whole video or only from the video data where little or no stable segments are detected and performing camera view motion compensation by using information provided by the auxiliary sensors for modifying the feature-descriptors.
Abstract:
The embodiments relate to a method comprising compressing input data (I) by means of at least a neural network (E, 310); determining a compression rate for data compression; miming the neural network (E, 310) with the input data (I) to produce an output data (c); removing a number of elements from the output data (c) according to the compression rate to result in a reduced form of the output data (me); and providing the reduced form of the output data (me) and the compression rate to a decoder (D, 320). The embodiments also relate to a method comprising receiving input data (me) for decompression; decompressing the input data (me) by means of at least a neural network (D, 320); determining a decompression rate for decompressing the input data (me); miming the neural network (D, 320) with input data (me) to produce a decompressed output data (ĩ); padding a number of elements to the compressed input data (me) according to the decompression rate to produce an output data (ĩ); and providing the output data (ĩ).
Abstract:
A method comprising: obtaining a configuration of at least one neural network comprising a plurality of intra-prediction mode agnostic layers and one or more intra-prediction mode specific layers, the one or more intra-prediction mode specific layers corresponding to different intra-prediction modes; obtaining at least one input video frame comprising a plurality of blocks; determining to encode one or more blocks using intra prediction; determining an intra-prediction mode for each of said one or more blocks; grouping blocks having same intra-prediction mode into groups, each group being assigned with a computation path among the plurality of intra-prediction mode agnostic and the one or more intra-prediction mode specific layers; training the plurality of intra-prediction mode agnostic and/or the one or more intra-prediction mode specific layers of the neural networks based on a training loss between an output of the neural networks relating to a group of blocks and ground-truth blocks, wherein the ground-truth blocks are either blocks of the input video frame or reconstructed blocks; and encoding a block using a computation path assigned to an intra-prediction mode for the block.
Abstract:
An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: maintain a first parameter update tree that tracks residuals of weight updates of a machine learning model; maintain a second parameter update tree that tracks the weight updates of the machine learning model; pass the first parameter update tree and the residuals to an encoder; receive a first bitstream generated for the residuals from the encoder; pass the second parameter update tree and the weight updates to the encoder; receive a second bitstream generated for the weight updates from the encoder; and determine whether to signal to a decoder the first bitstream generated for the residuals or the second bitstream generated for the weight updates.
Abstract:
A method includes maintaining a set of parameters or weights derived through online learning for a neural net; transmitting an update of the parameters or weights to a decoder; deriving a first prediction block based on an output of the neural net using the parameters or weights; deriving a first encoded prediction error block through encoding a difference of the first prediction block and a first input block; encoding the first encoded prediction error block into a bitstream; deriving a reconstructed prediction error block based on the first encoded prediction error block; deriving a second prediction block based on an output of the neural net using the parameters or weights and the reconstructed prediction error block; deriving a second encoded prediction error block through encoding a difference of the second prediction block and a second input block; and encoding the second encoded prediction error block into a bitstream.