Abstract:
Example embodiments disclosed herein relate to perception based multimedia processing. There is provided a method for processing multimedia data, the method includes automatically determining user perception on a segment of the multimedia data based on a plurality of clusters, the plurality of clusters obtained in association with predefined user perceptions and processing the segment of the multimedia data at least in part based on determined user perception on the segment. Corresponding system and computer program products are disclosed as well.
Abstract:
Equalizer controller and controlling method are disclosed. In one embodiment, an equalizer controller includes an audio classifier for identifying the audio type of an audio signal in real time; and an adjusting unit for adjusting an equalizer in a continuous manner based on the confidence value of the audio type as identified.
Abstract:
Embodiments of identifying multimedia objects based on multimedia fingerprints are provided. Query fingerprints are derived from a multimedia object according to differing fingerprint algorithms. For each fingerprint algorithm, decisions are calculated through at least one classifier corresponding to the fingerprint algorithm based on the query fingerprint and reference fingerprints, the reference fingerprints being derived from reference multimedia objects according to the same fingerprint algorithm. Each of the decisions indicates a possibility that the query fingerprint and the reference fingerprint are not derived from the same multimedia content. For each of the reference multimedia objects, a distance is calculated as a weighted sum of the decisions relating to the reference fingerprints. The multimedia object is identified as matching the reference multimedia object with the smallest distance less than a threshold.
Abstract:
Example embodiments disclosed herein relate to audio source separation with source direction determined based on iterative weighted component analysis. A method of separating audio sources in audio content is disclosed. The audio content includes a plurality of channels. The method includes obtaining multiple data samples from multiple time-frequency tiles of the audio content. The method also includes analyzing the data samples to generate multiple components in a plurality of iterations, wherein each of the components indicates a direction with a variance of the data samples, and wherein in each of the plurality of iterations, each of the data samples is weighted with a weight that is determined based on a selected component from the multiple components. The method further includes determining a source direction of the audio content based on the selected component for separating an audio source from the audio content. Corresponding system and computer program product of separating audio sources in audio content are also disclosed.
Abstract:
Embodiments of the present invention relate to audio object extraction. A method for audio object extraction from audio content of a format based on a plurality of channels is disclosed. The method comprises applying audio object extraction on individual frames of the audio content at least partially based on frequency spectral similarities among the plurality of channels. The method further comprises performing audio object composition across the frames of the audio content, based on the audio object extraction on the individual frames, to generate a track of at least one audio object. Corresponding system and computer program product are also disclosed.
Abstract:
Example embodiments disclosed herein relate to audio object processing. A method for processing audio content, which includes at least one audio object of a multi-channel format, is disclosed. The method includes generating metadata associated with the audio object, the metadata including at least one of an estimated trajectory of the audio object and an estimated perceptual size of the audio object, the perceptual size being a perceived area of a phantom of the audio object produced by at least two transducers. Corresponding system and computer program product are also disclosed.
Abstract:
Example embodiments disclosed herein relate to perception based multimedia processing. There is provided a method for processing multimedia data, the method includes automatically determining user perception on a segment of the multimedia data based on a plurality of clusters, the plurality of clusters obtained in association with predefined user perceptions and processing the segment of the multimedia data at least in part based on determined user perception on the segment. Corresponding system and computer program products are disclosed as well.
Abstract:
Embodiments of identifying multimedia objects based on multimedia fingerprints are provided. Query fingerprints are derived from a multimedia object according to differing fingerprint algorithms. For each fingerprint algorithm, decisions are calculated through at least one classifier corresponding to the fingerprint algorithm based on the query fingerprint and reference fingerprints, the reference fingerprints being derived from reference multimedia objects according to the same fingerprint algorithm. Each of the decisions indicates a possibility that the query fingerprint and the reference fingerprint are not derived from the same multimedia content. For each of the reference multimedia objects, a distance is calculated as a weighted sum of the decisions relating to the reference fingerprints. The multimedia object is identified as matching the reference multimedia object with the smallest distance less than a threshold.
Abstract:
Embodiments of identifying multimedia objects based on multimedia fingerprints are provided. Query fingerprints are derived from a multimedia object according to differing fingerprint algorithms. For each fingerprint algorithm, decisions are calculated through at least one classifier corresponding to the fingerprint algorithm based on the query fingerprint and reference fingerprints, the reference fingerprints being derived from reference multimedia objects according to the same fingerprint algorithm. Each of the decisions indicates a possibility that the query fingerprint and the reference fingerprint are not derived from the same multimedia content. For each of the reference multimedia objects, a distance is calculated as a weighted sum of the decisions relating to the reference fingerprints. The multimedia object is identified as matching the reference multimedia object with the smallest distance less than a threshold.
Abstract:
Example embodiments disclosed herein relates to upmixing of audio signals. A method of upmixing an audio signal is described. The method includes decomposing the audio signal into a diffuse signal and a direct signal, generating an audio bed at least in part based on the diffuse signal, the audio bed including a height channel, extracting an audio object from the direct signal, estimating metadata of the audio object, the metadata including height information of the audio object; and rendering the audio bed and the audio object as an upmixed audio signal, wherein the audio bed is rendered to a predefined position and the audio object is rendered according to the metadata. Corresponding system and computer program product are described as well.