Abstract:
In general, disclosed is a device that includes one or more processors, coupled to the memory, configured to perform an energy analysis with respect to one or more audio objects, in the ambisonics domain, in the first time segment. The one or more processors are also configured to perform a similarity measure between the one or more audio objects, in the ambisonics domain, in the first time segment, and the one or more audio objects, in the ambisonics domain, in the second time segment. In addition, the one or more processors are configured to perform a reorder of the one or more audio objects, in the ambisonics domain, in the first time segment with the one or more audio objects, in the ambisonics domain, in the second time segment, to generate one or more reordered audio objects in the first time segment.
Abstract:
In general, techniques are described for determining quantization step sizes for compression of spatial components of a sound field. A device comprising one or more processors may be configured to perform the techniques. In other words, the one or more processors may be configured to determine a quantization step size to be used when compressing a spatial component of a sound field, where the spatial component generated by performing a vector based synthesis with respect to a plurality of spherical harmonic coefficients.
Abstract:
A personalized (i.e., speaker-derivable) bandwidth extension is provided in which the model used for bandwidth extension is personalized (e.g., tailored) to each specific user. A training phase is performed to generate a bandwidth extension model that is personalized to a user. The model may be subsequently used in a bandwidth extension phase during a phone call involving the user. The bandwidth extension phase, using the personalized bandwidth extension model, will be activated when a higher band (e.g., wideband) is not available and the call is taking place on a lower band (e.g., narrowband).
Abstract:
Aspects of the disclosure relate to using a display as a sound emitter and may relate to an electronic device including a display. In particular a vibration sensor such as an accelerometer is physically coupled to the display and senses display vibration to provide a high accuracy feedback loop with respect to representing actual audio output from the display. The electronic device includes an actuator physically coupled to the display and configured to cause vibration of the display in response to an audio signal. The electronic device further includes a vibration sensor physically coupled to the display and configured to output a vibration sensor signal proportional to the vibration of the display due to the actuator. The electronic device further includes a processor operably coupled to the vibration sensor. The processor is configured to adjust the audio signal based on the vibration sensor signal from the vibration sensor.
Abstract:
Aspects of the disclosure relate to using a display as a sound emitter and may relate to an electronic device including a display. In particular a vibration sensor such as an accelerometer is physically coupled to the display and senses display vibration to provide a high accuracy feedback loop with respect to representing actual audio output from the display. The electronic device includes an actuator physically coupled to the display and configured to cause vibration of the display in response to an audio signal. The electronic device further includes a vibration sensor physically coupled to the display and configured to output a vibration sensor signal proportional to the vibration of the display due to the actuator.
Abstract:
In general, techniques are described for determining quantization step sizes for compression of spatial components of a sound field. A device comprising one or more processors may be configured to perform the techniques. In other words, the one or more processors may be configured to determine a quantization step size to be used when compressing a spatial component of a sound field, where the spatial component generated by performing a vector based synthesis with respect to a plurality of spherical harmonic coefficients.
Abstract:
In general, techniques are described for compressing decomposed representations of a sound field. A device comprising a memory and processing circuitry may be configured to perform the techniques. The memory may be configured to store a bitstream representative of scene-based audio data, the scene-based audio data comprising ambisonic coefficients representative of a soundfield. The processing circuitry may be configured to process the bitstream to extract foreground components and corresponding foreground directional information, dequantize the corresponding foreground directional information to obtain corresponding dequantized directional information, and obtain, based on the foreground components and the corresponding dequantized foreground directional information, a reconstructed version of the scene-based audio data.
Abstract:
In general, techniques are described for compressing decomposed representations of a sound field. A device comprising one or more processors may be configured to perform the techniques. The one or more processors may be configured to obtain a bitstream comprising a compressed version of a spatial component of a sound field, the spatial component generated by performing a vector based synthesis with respect to a plurality of spherical harmonic coefficients.
Abstract:
A personalized (i.e., speaker-derivable) bandwidth extension is provided in which the model used for bandwidth extension is personalized (e.g., tailored) to each specific user. A training phase is performed to generate a bandwidth extension model that is personalized to a user. The model may be subsequently used in a bandwidth extension phase during a phone call involving the user. The bandwidth extension phase, using the personalized bandwidth extension model, will be activated when a higher band (e.g., wideband) is not available and the call is taking place on a lower band (e.g., narrowband).
Abstract:
In general, techniques are described for identifying a codebook to be used when compressing spatial components of a sound field. A device comprising one or more processors may be configured to perform the techniques. The one or more processors may be configured to identify a Huffman codebook to use when compressing a spatial component of a plurality of spatial components based on an order of the spatial component relative to remaining ones of the plurality of spatial components, the spatial component generated by performing a vector based synthesis with respect to a plurality of spherical harmonic coefficients.