Abstract:
A multi-party control unit (MCU) generates, based on audio data streams that represent sounds associated terminal devices, a mixed audio data stream. In addition, the MCU modifies the mixed mono audio data to steganographically embed sub-streams that include representations of the mono audio data streams. A terminal device receives the modified mixed audio data stream. When the terminal device is configured for stereo playback, the terminal device performs an inverse steganographic process to extract, from the mixed audio data stream, the sub-streams. The terminal device generates and outputs multi-channel audio data based on the extracted sub-streams and the mixed audio data stream. When the terminal device is not configured for stereo playback, the terminal device outputs sound based on the mixed audio data stream without extracting the embedded sub-streams.
Abstract:
Methods, systems, and devices for encoding are described. A device, which may be otherwise known as user equipment (UE), may support standards-compatible audio encoding (e.g., speech encoding) using a pre-encoded database. The device may receive a digital representation of an audio signal and identify, based on receiving the digital representation of the audio signal, a database that is pre-encoded according to a coding standard and that includes a quantity of digital representations of other audio signals. The device may encode the digital representation of the audio signal using a machine learning scheme and information from the database pre-encoded according to the coding standard. The device may generate a bitstream of the digital representation that is compatible with the coding standard based on encoding the digital representation of the audio signal, and output a representation of the bitstream.
Abstract:
Methods, systems, and devices for encoding are described. A device, which may be otherwise known as user equipment (UE), may support standards-compatible audio encoding (e.g., speech encoding) using a pre-encoded database. The device may receive a digital representation of an audio signal and identify, based on receiving the digital representation of the audio signal, a database that is pre-encoded according to a coding standard and that includes a quantity of digital representations of other audio signals. The device may encode the digital representation of the audio signal using a machine learning scheme and information from the database pre-encoded according to the coding standard. The device may generate a bitstream of the digital representation that is compatible with the coding standard based on encoding the digital representation of the audio signal, and output a representation of the bitstream.
Abstract:
A method includes extracting a voicing classification parameter of an audio signal and determining a filter coefficient of a low pass filter based on the voicing classification parameter. The method also includes filtering a low-band portion of the audio signal to generate a low-band audio signal and controlling an amplitude of a temporal envelope of the low-band audio signal based on the filter coefficient. The method also includes modulating a white noise signal based on the amplitude of the temporal envelope to generate a modulated white noise signal and scaling the modulated white noise signal based on a noise gain to generate a scaled modulated white noise signal. The method also includes mixing a scaled version of the low-band audio signal with the scaled modulated white noise signal to generate a high-band excitation signal that is used to generate a decoded version of the audio signal.
Abstract:
A method for determining pitch pulse period signal boundaries by an electronic device is described. The method includes obtaining a signal. The method also includes determining a first averaged curve based on the signal. The method further includes determining at least one first averaged curve peak position based on the first averaged curve and a threshold. The method additionally includes determining pitch pulse period signal boundaries based on the at least one first averaged curve peak position. The method also includes synthesizing a speech signal.
Abstract:
A method for quantizing phase information on an electronic device is described. The method includes obtaining a speech signal. The method also includes determining a prototype pitch period signal based on the speech signal and transforming the prototype pitch period signal into a first frequency-domain signal. The method additionally includes mapping the first frequency-domain signal into a plurality of subbands. The method also includes determining a global alignment based on the first frequency-domain signal and quantizing the global alignment utilizing scalar quantization to obtain a quantized global alignment. The method additionally includes determining a plurality of band alignments corresponding to the plurality of subbands. The method also includes quantizing the plurality of band alignments utilizing vector quantization to obtain a quantized plurality of band alignments. The method further includes transmitting the quantized global alignment and the quantized plurality of band alignments.
Abstract:
A particular method includes determining, based on spectral information corresponding to an audio signal that includes a low-band portion and a high-band portion, that the audio signal includes a component corresponding to an artifact-generating condition. The method also includes filtering the high-band portion of the audio signal and generating an encoded signal. Generating the encoded signal includes determining gain information based on a ratio of a first energy corresponding to filtered high-band output to a second energy corresponding to the low-band portion to reduce an audible effect of the artifact-generating condition.
Abstract:
A method includes extracting a voicing classification parameter of an audio signal and determining a filter coefficient of a low pass filter based on the voicing classification parameter. The method also includes filtering a low-band portion of the audio signal to generate a low-band audio signal and controlling an amplitude of a temporal envelope of the low-band audio signal based on the filter coefficient. The method also includes modulating a white noise signal based on the amplitude of the temporal envelope to generate a modulated white noise signal and scaling the modulated white noise signal based on a noise gain to generate a scaled modulated white noise signal. The method also includes mixing a scaled version of the low-band audio signal with the scaled modulated white noise signal to generate a high-band excitation signal that is used to generate a decoded version of the audio signal.
Abstract:
A particular method includes determining, at a device, a voicing classification of an input signal. The input signal corresponds to an audio signal. The method also includes controlling an amount of an envelope of a representation of the input signal based on the voicing classification. The method further includes modulating a white noise signal based on the controlled amount of the envelope. The method also includes generating a high band excitation signal based on the modulated white noise signal.
Abstract:
Systems and methods of performing blind bandwidth extension are disclosed. In an embodiment, a method includes determining, based on a set of low-band parameters of an audio signal, a first set of high-band parameters and a second set of high-band parameters. The method further includes generating a predicted set of high-band parameters based on a weighted combination of the first set of high-band parameters and the second set of high-band parameters.