摘要:
A system and method code an object-based audio signal comprising audio objects in response to audio streams with associated metadata. In the system and method, a metadata processor codes the metadata and generates information about bit-budgets for the coding of the metadata of the audio objects. An encoder codes the audio streams while a bit-budget allocator is responsive to the information about the bit-budgets for the coding of the metadata of the audio objects from the metadata processor to allocate bitrates for the coding of the audio streams by the encoder.
摘要:
A method and device for detecting an attack in a sound signal to be coded wherein the sound signal is processed in successive frames each including a number of sub-frames. The device comprises a first-stage attack detector for detecting the attack in a last sub-frame of a current frame, and a second-stage attack detector for detecting the attack in one of the sub-frames of the current frame, including the sub-frames preceding the last sub-frame. No attack is detected when the current frame is not an active frame previously classified to be coded using a generic coding mode. A method and device for coding an attack in a sound signal are also provided. The coding device comprises the above mentioned attack detecting device and an encoder of the sub-frame comprising the detected attack using a transition coding mode using a glottal-shape codebook populated with glottal impulse shapes.
摘要:
A stereo sound signal encoding method and system for time domain down mixing right and left channels of an input stereo sound signal into primary and secondary channels, determine normalised correlations of the left channel and right channel in relation to a monophonic signal version of the sound. A long-term correlation difference is determined on the basis of the normalised correlation of the left channel and the normalized correlation of the right channel. The long-term correlation difference is converted into a factor β, and the left and right channels are mixed to produce the primary and secondary channels using the factor β, wherein the factor β determines respective contributions of the left and right channels upon production of the primary and secondary channels.
摘要:
A device and method for quantizing a gain of a fixed contribution of an excitation in a frame, including sub-frames, of a coded sound signal, wherein the gain of the fixed excitation contribution is estimated in a sub-frame using a parameter representative of a classification of the frame. The gain of the fixed excitation contribution is then quantized in the sub-frame using the estimated gain. The device and method is used in jointly quantizing gains of adaptive and fixed contributions of an excitation in a frame of a coded sound signal. For retrieving a quantized gain of a fixed contribution of an excitation in a sub-frame of a frame, the gain of the fixed excitation contribution is estimated using a parameter representative of a classification of the frame, a gain codebook supplies a correction factor in response to a received, gain codebook index, and a multiplier multiplies the estimated gain by the correction factor to provide a quantized gain of the fixed excitation contribution.
摘要:
A device and method for quantizing a gain of a fixed contribution of an excitation in a frame, including sub-frames, of a coded sound signal, wherein the gain of the fixed excitation contribution is estimated in a sub-frame using a parameter representative of a classification of the frame. The gain of the fixed excitation contribution is then quantized in the sub-frame using the estimated gain. The device and method is used in jointly quantizing gains of adaptive and fixed contributions of an excitation in a frame of a coded sound signal. For retrieving a quantized gain of a fixed contribution of an excitation in a sub-frame of a frame, the gain of the fixed excitation contribution is estimated using a parameter representative of a classification of the frame, a gain codebook supplies a correction factor in response to a received, gain codebook index, and a multiplier multiplies the estimated gain by the correction factor to provide a quantized gain of the fixed excitation contribution.
摘要:
The present disclosure relates to a device and method for reducing quantization noise in a sound signal contained in a time-domain excitation decoded by a time-domain decoder. A future frame time-domain excitation is evaluated based on the decoded time-domain excitation. A concatenated time-domain excitation is produced from the decoded time-domain excitation of the time-domain excitation of the future frame and is converted into a frequency-domain excitation. A weighting mask is produced for retrieving spectral information lost in the quantization noise. The frequency-domain excitation is modified to increase spectral dynamics by application of the weighting mask. The modified frequency-domain excitation is converted into a modified time-domain excitation. The latter conversion is delay-less. In an embodiment, the weighting mask may be produced using time averaging or frequency averaging or a combination of time and frequency averaging of the frequency-domain excitation. The method and device can be used for improving music content rendering of linear-prediction (LP) based codecs.
摘要:
The present disclosure relates to a device and method for reducing quantization noise in a signal contained in a time-domain excitation decoded by a time-domain decoder. The decoded time-domain excitation is converted into a frequency-domain excitation. A weighting mask is produced for retrieving spectral information lost in the quantization noise. The frequency-domain excitation is modified to increase spectral dynamics by application of the weighting mask. The modified frequency-domain excitation is converted into a modified time-domain excitation. The method and device can be used for improving music content rendering of linear-prediction (LP) based codecs. Optionally, a synthesis of the decoded time-domain excitation may be classified into one of a first set of excitation categories and a second set of excitation categories, the second set including INACTIVE or UNVOICED categories, the first set including an OTHER category.
摘要:
An audio encoder has a first information sink oriented encoding branch, a second information source or SNR oriented encoding branch, and a switch for switching between the first encoding branch and the second encoding branch, wherein the second encoding branch has a converter into a specific domain different from the spectral domain, and wherein the second encoding branch furthermore has a specific domain coding branch, and a specific spectral domain coding branch, and an additional switch for switching between the specific domain coding branch and the specific spectral domain coding branch. An audio decoder has a first domain decoder, a second domain decoder for decoding a signal, and a third domain decoder and two cascaded switches for switching between the decoders.
摘要:
An audio encoder for encoding an audio signal has a first coding branch, the first coding branch comprising a first converter for converting a signal from a time domain into a frequency domain. Furthermore, the audio encoder has a second coding branch comprising a second time/frequency converter. Additionally, a signal analyzer for analyzing the audio signal is provided. The signal analyzer, on the hand, determines whether an audio portion is effective in the encoder output signal as a first encoded signal from the first encoding branch or as a second encoded signal from a second encoding branch. On the other hand, the signal analyzer determines a time/frequency resolution to be applied by the converters when generating the encoded signals. An output interface includes, in addition to the first encoded signal and the second encoded signal, a resolution information identifying the resolution used by the first time/frequency converter and used by the second time/frequency converter.
摘要:
A method and device for encoding a stereo sound signal comprise stereo encoders using stereo modes operating in time domain (TD), in frequency domain (FD) or in modified discrete Fourier transform (MDCT) domain. A controller controls switching between the TD, FD and MDCT stereo modes. Upon switching from one stereo mode to the other, the switching controller may (a) recalculate at least one length of down-processed/mixed signal in a current frame of the stereo sound signal, (b) reconstruct a down-processed/mixed signal and also other signals related to the other stereo mode in the current frame, (c) adapt data structures and/or memories for coding the stereo sound signal in the current frame using the other stereo mode, and/or (d) alter a TD stereo channel down-mixing to maintain a correct phase of left and right channels of the stereo sound signal. Corresponding stereo sound signal decoding method and device are described.