摘要:
A system and method of blind bandwidth extension. The system selects a prediction model from a number of stored prediction models that were generated using an unsupervised clustering method (e.g., a k-means method) and a supervised regression process (e.g., a support vector machine), and extends the bandwidth of an input musical audio signal.
摘要:
The present disclosure relates to reverberation generation for headphone virtualization. A method of generating one or more components of a binaural room impulse response (BRIR) for headphone virtualization is described. In the method, directionally-controlled reflections are generated, wherein directionally-controlled reflections impart a desired perceptual cue to an audio input signal corresponding to a sound source location. Then at least the generated reflections are combined to obtain the one or more components of the BRIR. Corresponding system and computer program products are described as well.
摘要:
A method for performing inter-channel encoding of a multi-channel audio signal comprising channel signals for N channels, with N being an integer, with N>1, is described. The method comprises determining a basic graph comprising the N channels as nodes and comprising directed edges between at least some of the N channels. Furthermore, the method comprises determining an inter-channel coding graph from the basic graph, such that the inter-channel coding graph is a directed acyclic graph, and such that a cumulated a cumulated cost of the signals of the nodes of the inter-channel coding graph is reduced.
摘要:
The present disclosure relates to reverberation generation for headphone virtualization. A method of generating one or more components of a binaural room impulse response (BRIR) for headphone virtualization is described. In the method, directionally-controlled reflections are generated, wherein directionally-controlled reflections impart a desired perceptual cue to an audio input signal corresponding to a sound source location. Then at least the generated reflections are combined to obtain the one or more components of the BRIR. Corresponding system and computer program products are described as well.
摘要:
The present invention relates to a method for predicting transform coefficients representing frequency content of an adaptive block length media signal, by receiving a frame and receiving block length information indicating a number of quantized transform coefficients for each block in the frame, the number of quantized transform coefficients being one of a first or second number, wherein the first number is greater than the second number, determining a first block has the second number of quantized transform coefficients, converting the first block into a converted block having the first number of quantized transform coefficients, conditioning a main neural network trained to predict at least one output variable given at least one conditioning variable, the at least one conditioning variable being based on information regarding the converted block and block length information for the first block, providing at least one predicted transform coefficients from an output stage of the main neural network.
摘要:
An audio processing system, such as an upmixer, may be capable of separating diffuse and non-diffuse portions of N input audio signals. The upmixer may be capable of detecting instances of transient audio signal conditions. During instances of transient audio signal conditions, the up-mixer may be capable of adding a signal-adaptive control to a diffuse signal expansion process in which M audio signals are output. The upmixer may vary the diffuse signal expansion process over time such that during instances of transient audio signal conditions the diffuse portions of audio signals may be distributed substantially only to output channels spatially close to the input channels. During instances of non-transient audio signal conditions, the diffuse portions of diffuse portions of audio signals may be distributed in a substantially uniform manner.
摘要:
A method for determining mantissa bit allocation of audio data values of frequency domain audio data to be encoded. The allocation method includes a step of determining masking values for the audio data values, including by performing adaptive low frequency compensation on the audio data of each frequency band of a set of low frequency bands of the audio data. The adaptive low frequency compensation includes steps of: performing tonality detection on the audio data to generate compensation control data indicative of whether each frequency band in the set of low frequency bands has prominent tonal content; and performing low frequency compensation on the audio data in each frequency band in the set of low frequency bands having prominent tonal content as indicated by the compensation control data, but not performing low frequency compensation on the audio data in any other frequency band in the set of low frequency bands.
摘要:
Described herein is a method of processing an audio signal using a neural network or using a first and a second neural network. Described is further a method of training said neural network or of jointly training a set of said first and said second neural network. Moreover, described is a method of obtaining and transmitting a latent feature space representation of a perceptual domain audio signal using a neural network and a method of obtaining an audio signal from a latent feature space representation of a perceptual domain audio signal using a neural network. Described are also respective apparatuses and computer program products.
摘要:
The present disclosure relates to reverberation generation for headphone virtualization. A method of generating one or more components of a binaural room impulse response (BRIR) for headphone virtualization is described. In the method, directionally-controlled reflections are generated, wherein directionally-controlled reflections impart a desired perceptual cue to an audio input signal corresponding to a sound source location. Then at least the generated reflections are combined to obtain the one or more components of the BRIR. Corresponding system and computer program products are described as well.
摘要:
Audio processing methods may involve receiving audio data corresponding to a plurality of audio channels. The audio data may include a frequency domain representation corresponding to filterbank coefficients of an audio encoding or processing system. A decorrelation process may be performed with the same filterbank coefficients used by the audio encoding or processing system. The decorrelation process may be performed without converting coefficients of the frequency domain representation to another frequency domain or time domain representation. The decorrelation process may involve selective or signal-adaptive decorrelation of specific channels and/or specific frequency bands. The decorrelation process may involve applying a decorrelation filter to a portion of the received audio data to produce filtered audio data. The decorrelation process may involve using a non-hierarchal mixer to combine a direct portion of the received audio data with the filtered audio data according to spatial parameters.