Abstract:
An audio encoding apparatus and method that encodes hybrid contents including an object sound, a background sound, and metadata, and an audio decoding apparatus and method that decodes the encoded hybrid contents are provided. The audio encoding apparatus may include a mixing unit to generate an intermediate channel signal by mixing a background sound and an object sound, a matrix information encoding unit to encode matrix information used for the mixing, an audio encoding unit to encode the intermediate channel signal, and a metadata encoding unit to encode metadata including control information of the object sound.
Abstract:
An audio encoding apparatus to encode an audio signal using lossless coding or lossy coding and an audio decoding apparatus to decode an encoded audio signal are disclosed. An audio encoding apparatus according to an exemplary embodiment may include an input signal type determination unit to determine a type of an input signal based on characteristics of the input signal, a residual signal generation unit to generate a residual signal based on an output signal from the input signal type determination unit, and a coding unit to perform lossless coding or lossy coding using the residual signal.
Abstract:
An encoding method for a multi-channel audio signal, an encoding apparatus for performing the encoding method, and a decoding method for a multi-channel audio signal and a decoding apparatus for performing the decoding method are disclosed. A method and apparatus of bypassing an MPEG Surround (MPS) standard operation and using an arbitrary tree when a number of audio signals of N channels exceeds a channel number defined in an MPS standard, is disclosed.
Abstract:
An audio encoding apparatus to encode an audio signal using lossless coding or lossy coding and an audio decoding apparatus to decode an encoded audio signal are disclosed. An audio encoding apparatus according to an exemplary embodiment may include an input signal type determination unit to determine a type of an input signal based on characteristics of the input signal, a residual signal generation unit to generate a residual signal based on an output signal from the input signal type determination unit, and a coding unit to perform lossless coding or lossy coding using the residual signal.
Abstract:
Provided is an audio signal processing apparatus and method that may convert a speech and audio signal to a spectrogram image, calculate a local gradient using a mask matrix from the spectrogram image, divide the local gradient into blocks of a preset size, generate a weighted histogram for each block, generate an audio feature vector by connecting weighted histograms of the blocks, generate a feature set by performing a discrete cosine transform (DCT) on a feature set of the audio feature vector, and generate an optimized feature set by eliminating an unnecessary region from the transformed feature set and reducing a size of the transformed feature set.
Abstract:
A frequency spectrum processing apparatus and method using a source filter are disclosed. The frequency spectrum processing apparatus may include a first excitation spectrum generation unit to generate a first excitation spectrum using a tonal excitation spectrum according to an input signal and a gain of the tonal excitation spectrum, a second excitation spectrum generation unit to generate a second excitation spectrum using a non-tonal excitation spectrum according to the input signal and a gain of the non-tonal excitation spectrum, and an output spectrum generation unit to generate an output spectrum using the first excitation spectrum and the second excitation spectrum.
Abstract:
An audio signal processing method, which is executed by a processor electronically communicating with a deep neural network within a computing system, may comprise: acquiring, by the processor, an input signal before encoding and an output signal after quantization and decoding; calculating, by the processor, a perceptual global loss for a frame corresponding to the input and the output signals; acquiring, by the processor, a plurality of subframes corresponding to the input and output signals by applying a windowing function to the frame of the input and output signals; calculating, by the processor, perceptual local losses for the plurality of subframes corresponding to the input and output signals; and acquiring, by the processor, multi-time scale perceptual loss based on the perceptual global and local losses.
Abstract:
An encoder and an encoding method for a multi-channel signal, and a decoder and a decoding method for a multi-channel signal are disclosed. A multi-channel signal may be efficiently processed by consecutive downmixing or upmixing.
Abstract:
An audio encoding apparatus and method that encodes hybrid contents including an object sound, a background sound, and metadata, and an audio decoding apparatus and method that decodes the encoded hybrid contents are provided. The audio encoding apparatus may include a mixing unit to generate an intermediate channel signal by mixing a background sound and an object sound, a matrix information encoding unit to encode matrix information used for the mixing, an audio encoding unit to encode the intermediate channel signal, and a metadata encoding unit to encode metadata including control information of the object sound.
Abstract:
Provided are an encoding method of a multichannel signal, an encoding apparatus to perform the encoding method, a multichannel signal processing method, and a decoding apparatus to perform the decoding method. The decoding method may include identifying an N/2-channel downmix signal derived from an N-channel input signal; and generating an N-channel output signal from the identified N/2-channel downmix signal using a plurality of one-to-two (OTT) boxes. If a low frequency effect (LFE) channel is absent in the output signal, the number of OTT boxes may be equal to N/2 where N/2 denotes the number of channels of the downmix signal.