Abstract:
A personalized (i.e., speaker-derivable) bandwidth extension is provided in which the model used for bandwidth extension is personalized (e.g., tailored) to each specific user. A training phase is performed to generate a bandwidth extension model that is personalized to a user. The model may be subsequently used in a bandwidth extension phase during a phone call involving the user. The bandwidth extension phase, using the personalized bandwidth extension model, will be activated when a higher band (e.g., wideband) is not available and the call is taking place on a lower band (e.g., narrowband).
Abstract:
A method for encoding three dimensional audio by a wireless communication device is disclosed. The wireless communication device detects an indication of a plurality of localizable audio sources. The wireless communication device also records a plurality of audio signals associated with the plurality of localizable audio sources. The wireless communication device also encodes the plurality of audio signals.
Abstract:
Systems, devices, and methods are described for recognizing and focusing on at least one source of an audio communication as part of a communication including a video image and an audio communication derived from two or more microphones when a relative position between the microphones is known. In certain embodiments, linked audio and video focus areas providing location information for one or more sound sources may each be associated with different user inputs, and an input to adjust a focus in either the audio or video domain may automatically adjust the focus in the another domain.
Abstract:
In general, techniques are described for performing constrained dynamic amplitude panning in collaborative sound systems. A headend device comprising one or more processors may perform the techniques. The processors may be configured to identify, for a mobile device participating in a collaborative surround sound system, a specified location of a virtual speaker of the collaborative surround sound system and determine a constraint that impacts playback of audio signals rendered from an audio source by the mobile device. The processors may be further configure to perform dynamic spatial rendering of the audio source with the determined constraint to render audio signals that reduces the impact of the determined constraint during playback of the audio signals by the mobile device.
Abstract:
A method includes, while operating an audio processing device in a use mode, retrieving first direction of arrival (DOA) data corresponding to a first audio output device from a memory of the audio processing device and generating a first null beam directed toward the first audio output device based on the first DOA data. The method also includes retrieving second DOA data corresponding to a second audio output device from the memory of the audio processing device and generating a second null beam directed toward the second audio output device based on the second DOA data. The first DOA data and the second DOA data are stored in the memory during operation of the audio processing device in a calibration mode.
Abstract:
A method for echo reduction by an electronic device is described. The method includes nulling at least one speaker. The method also includes mixing a set of runtime audio signals based on a set of acoustic paths to determine a reference signal. The method also includes receiving at least one composite audio signal that is based on the set of runtime audio signals. The method further includes reducing echo in the at least one composite audio signal based on the reference signal.
Abstract:
Disclosed is an application interface that takes into account the user's gaze direction relative to who is speaking in an interactive multi-participant environment where audio-based contextual information and/or visual-based semantic information is being presented. Among these various implementations, two different types of microphone array devices (MADs) may be used. The first type of MAD is a steerable microphone array (a.k.a. a steerable array) which is worn by a user in a known orientation with regard to the user's eyes, and wherein multiple users may each wear a steerable array. The second type of MAD is a fixed-location microphone array (a.k.a. a fixed array) which is placed in the same acoustic space as the users (one or more of which are using steerable arrays).
Abstract:
A method for signal level matching by an electronic device is described. The method includes capturing a plurality of audio signals from a plurality of microphones. The method also includes determining a difference signal based on an inter-microphone subtraction. The difference signal includes multiple harmonics. The method also includes determining whether a harmonicity of the difference signal exceeds a harmonicity threshold. The method also includes preserving the harmonics to determine an envelope. The method further applies the envelope to a noise-suppressed signal.
Abstract:
A wearable device may include a processor configured to detect a self-voice signal, based on one or more transducers. The processor may be configured to separate the self-voice signal from a background signal in an external audio signal based on using a multi-microphone speech generative network. The processor may also be configured to apply a first filter to an external audio signal, detected by at least one external microphone on the wearable device, during a listen through operation based on an activation of the audio zoom feature to generate a first listen-through signal that includes the external audio signal. The processor may be configured to produce an output audio signal that is based on at least the first listen-through signal that includes the external signal, and is based on the detected self-voice signal.
Abstract:
A device includes a memory configured to store untransformed ambisonic coefficients at different time segments. The device includes one or more processors configured to obtain the untransformed ambisonic coefficients at the different time segments, where the untransformed ambisonic coefficients at the different time segments represent a soundfield at the different time segments. The one or more processors are configured to apply one adaptive network, based on a constraint that includes preservation of a spatial direction of one or more audio sources in the soundfield at the different time segments, to the untransformed ambisonic coefficients at the different time segments to generate transformed ambisonic coefficients at the different time segments, wherein the transformed ambisonic coefficients at the different time segments represent a modified soundfield at the different time segments, that was modified based on the constraint. The one or more processors are also configured to apply an additional adaptive network.