Abstract:
The present invention relates to an apparatus for improving a perception of a sound signal, the apparatus comprising: a separation unit configured to separate the sound signal into at least one speech component and at least one noise component; and a spatial rendering unit configured to generate an auditory impression of the at least one speech component at a first virtual position with respect to a user, when output via a transducer unit, and of the at least one noise component at a second virtual position with respect to the user, when output via the transducer unit.
Abstract:
A method for reconstructing at least one target signal comprises determining a first set of feature vectors from the input signal, the first set of feature vectors forming a non-negative input matrix; determining a second set of feature vectors, the second set of feature vectors forming a non-negative noise matrix; decomposing the input matrix into a sum of a first matrix and a second matrix, the first matrix representing a product of a non-negative bases matrix and a non-negative weight matrix, and the second matrix representing a combination of the noise matrix and a noise weight vector; and reconstructing the at least one target signal based on the non-negative bases matrix and the non-negative weight matrix.
Abstract:
The disclosure relates to an audio processing apparatus, comprising: a plurality of audio sensors, each audio sensor configured to receive a respective plurality of audio frames of an audio signal from an audio source, wherein the respective plurality of audio frames defines an audio channel of the audio signal; and a processing circuitry configured to: determine a respective feature set having at least one feature for each audio frame of each of the plurality of audio frames, wherein the plurality of features define a three-dimensional feature array; process the three-dimensional feature array using a neural network, wherein the neural network comprises a self-attention layer configured to process a plurality of two-dimensional sub-arrays of the three-dimensional feature array; and generate an output signal on the basis of the plurality of processed two-dimensional sub-arrays. Moreover, the disclosure relates to a corresponding audio processing method.
Abstract:
A method for reconstructing at least one target signal comprises determining a first set of feature vectors from the input signal, the first set of feature vectors forming a non-negative input matrix; determining a second set of feature vectors, the second set of feature vectors forming a non-negative noise matrix; decomposing the input matrix into a sum of a first matrix and a second matrix, the first matrix representing a product of a non-negative bases matrix and a non-negative weight matrix, and the second matrix representing a combination of the noise matrix and a noise weight vector; and reconstructing the at least one target signal based on the non-negative bases matrix and the non-negative weight matrix.