Abstract:
Methods of encoding and decoding a speech signal using a neural network model that recognizes sound sources, and encoding and decoding apparatuses for performing the methods are provided. A method of encoding a speech signal includes identifying an input signal for a plurality of sound sources; generating a latent signal by encoding the input signal; obtaining a plurality of sound source signals by separating the latent signal for each of the plurality of sound sources; determining a number of bits used for quantization of each of the plurality of sound source signals according to a type of each of the plurality of sound sources; quantizing each of the plurality of sound source signals based on the determined number of bits; and generating a bitstream by combining the plurality of quantized sound source signals.
Abstract:
A method and apparatus for processing an audio signal are disclosed. According to an example embodiment, a method of processing an audio signal may include acquiring a final audio signal for an initial audio signal using a plurality of neural network models generating output audio signals by encoding and decoding input audio signals, calculating a difference between the initial audio signal and the final audio signal in a time domain, converting the initial audio signal and the final audio signal into Mel-spectra, calculating a difference between the Mel-spectra of the initial audio signal and the final audio signal in a frequency domain, training the plurality of neural network models based on results calculated in the time domain and the frequency domain, and generating a new final audio signal distinguished from the final audio signal from the initial audio signal using the trained neural network models.
Abstract:
Disclosed are a method for coding a residual signal of LPC coefficients based on collaborative quantization and a computing device for performing the method. The residual signal coding method includes: generating encoded LPC coefficients and LPC residual signals by performing LPC analysis and quantization on an input speech; Determining a predicted LPC residual signal by applying the LPC residual signal to cross module residual learning; Performing LPC synthesis using the coded LPC coefficients and the predicted LPC residual signal; It may include the step of determining an output speech that is a synthesized output according to a result of performing the LPC synthesis.
Abstract:
Disclosed are a method of encoding a high band of an audio, a method of decoding a high band of an audio, and an encoder and a decoder for performing the methods. The method of decoding a high band of an audio, the method performed by a decoder, includes identifying a parameter extracted through a first neural network, identifying side information extracted through a second neural network, and restoring a high band of an audio by applying the parameter and the side information to a third neural network.
Abstract:
Provided is an apparatus and method for encoding/decoding audio based on a block. A method of encoding an audio signal may include dividing each of frame of input signal that constitute an audio signal into a plurality of subframes; transforming the subframes to a frequency domain; determining a two-dimensional (2D) intra block using the subframes transformed to the frequency domain; and encoding the 2D intra block. The 2D intra block may be a block that two-dimensionally displays frequency coefficients of the subframes transformed to the frequency domain using a time and a frequency.
Abstract:
Disclosed is a unified speech and audio coding (USAC) audio signal encoding/decoding apparatus and method for digital radio services. An audio signal encoding method may include receiving an audio signal, determining a coding method for the received audio signal, encoding the audio signal based on the determined coding method, and configuring, as an audio superframe of a fixed size, an audio stream generated as a result of encoding the audio signal, wherein the coding method may include a first coding method associated with extended high-efficiency advanced audio coding (xHE-AAC) and a second coding method associated with existing advanced audio coding (AAC).
Abstract:
The present invention relates to a new eye-contact function providing method which provides a natural eye-contact function to attendances by using a stereo image and a depth image to estimate a precise depth value of the occlusion region and improve a quality of a composite eye-contact image when there are two or more remote attendances in one site at the time of a video conference using a video conference system and an apparatus therefor.