Abstract:
A computer implemented system for rendering captured audio soundfields to a listener comprises apparatus to deliver the audio soundfields to the listener. The delivery apparatus delivers the audio soundfields to the listener with first and second audio elements perceived by the listener as emanating from first and second virtual source locations, respectively, and with the first audio element and/or the second audio element delivered to the listener from a third virtual source location. The first virtual source location and the second virtual source location are perceived by the listener as being located to the front of the listener, and the third virtual source location is located to the rear or the side of the listener.
Abstract:
Teleconference audio data including a plurality of individual uplink data packet streams, may be received during a teleconference. Each uplink data packet stream may corresponding to a telephone endpoint used by one or more teleconference participants. The teleconference audio data may be analyzed to determine a plurality of suppressive gain coefficients, which may be applied to first instances of the teleconference audio data during the teleconference, to produce first gain-suppressed audio data provided to the telephone endpoints during the teleconference. Second instances of the teleconference audio data, as well as gain coefficient data corresponding to the plurality of suppressive gain coefficients, may be sent to a memory system as individual uplink data packet streams. The second instances of the teleconference audio data may be less gain-suppressed than the first gain-suppressed audio data.
Abstract:
A computer implemented system for rendering captured audio soundfields to a listener comprises apparatus to deliver the audio soundfields to the listener. The delivery apparatus delivers the audio soundfields to the listener with first and second audio elements perceived by the listener as emanating from first and second virtual source locations, respectively, and with the first audio element and/or the second audio element delivered to the listener from a third virtual source location. The first virtual source location and the second virtual source location are perceived by the listener as being located to the front of the listener, and the third virtual source location is located to the rear or the side of the listener.
Abstract:
The present document relates to audio conference systems. In particular, the present document relates to the mapping of soundfields within an audio conference system. A conference multiplexer (110, 175, 210, 400) configured to place a first input soundfield signal (402) originating from a first soundfield endpoint (120, 170) within a 2D or 3D conference scene (300) to be rendered to a listener (301) is described. The first input soundfield signal (402) is indicative of a soundfield captured by the first soundfield endpoint (120, 170). The conference multiplexer (110, 175, 210, 400) is configured to set up the conference scene (300) comprising a plurality of talker locations (321, 322, 332, 331) at different angles (323, 333) with respect to the listener (301); provide a first sector (325); wherein the first sector (325) has a first angular width (324); wherein the first angular width (324) is greater than zero; and transform the first input soundfield signal (402) into a first output soundfield signal (403), such that for the listener (301) the first output soundfield signal (403) appears to be emanating from one or more virtual talker locations (321, 322) within the first sector (325).
Abstract:
The present document relates to audio conference systems. In particular, the present document relates to improving the perceptual continuity within an audio conference system. According to an aspect, a method for multiplexing first and second continuous input audio signals is described, to yield a multiplexed output audio signal which is to be rendered to a listener. The first and second input audio signals (123) are indicative of sounds captured by a first and a second endpoint (120, 170), respectively. The method comprises determining a talk activity (201, 202) in the first and second input audio signals (123), respectively; and determining the multiplexed output audio signal based on the first and/or second input audio signals (123) and subject to one or more multiplexing conditions. The one or more multiplexing conditions comprise: at a time instant, when there is talk activity (201) in the first input audio signal (123), determining the multiplexed output audio signal at least based on the first input audio signal (123); at a time instant, when there is talk activity (202) in the second input audio signal (123), determining the multiplexed output audio signal at least based on the second input audio signal (123); and at a silence time instant, when there is no talk activity (201, 202) in the first and in the second input audio signals (123), determining the multiplexed output audio signal based on only one of the first and second input audio signals (123).
Abstract:
A computer implemented system for rendering captured audio soundfields to a listener comprises apparatus to deliver the audio soundfields to the listener. The delivery apparatus delivers the audio soundfields to the listener with first and second audio elements perceived by the listener as emanating from first and second virtual source locations, respectively, and with the first audio element and/or the second audio element delivered to the listener from a third virtual source location. The first virtual source location and the second virtual source location are perceived by the listener as being located to the front of the listener, and the third virtual source location is located to the rear or the side of the listener.
Abstract:
The present disclosure relates to the field of audio enhancement, and in particular to methods, devices and software for supervised training of a machine learning model, MLM, the MLM trained to enhance a degraded audio signal by calculating gains to be applied to frequency bands of the degraded audio signal. The present disclosure further relates to methods, devices and software for use of such a trained MLM.
Abstract:
An audio processing method may involve receiving output signals from each microphone of a plurality of microphones in an audio environment, the output signals corresponding to a current utterance of a person and determining, based on the output signals, one or more aspects of context information relating to the person, including an estimated current proximity of the person to one or more microphone locations. The method may involve selecting two or more loudspeaker-equipped audio devices based, at least in part, on the one or more aspects of the context information, determining one or more types of audio processing changes to apply to audio data being rendered to loudspeaker feed signals for the audio devices and causing one or more types of audio processing changes to be applied. In some examples, the audio processing changes have the effect of increasing a speech to echo ratio at one or more microphones.
Abstract:
An audio session management method may involve: determining, by an audio session manager, one or more first media engine capabilities of a first media engine of a first smart audio device, the first media engine being configured for managing one or more audio media streams received by the first smart audio device and for performing first smart audio device signal processing for the one or more audio media streams according to a first media engine sample clock; receiving, by the audio session manager and via a first application communication link, first application control signals from the first application; and controlling the first smart audio device according to the first media engine capabilities, by the audio session manager, via first audio session management control signals transmitted to the first smart audio device via a first smart audio device communication link and without reference to the first media engine sample clock.
Abstract:
A feature vector may be extracted from each frame of input digitized microphone audio data. The feature vector may include a power value for each frequency band of a plurality of frequency bands. A feature history data structure, including a plurality of feature vectors, may be formed. A normalized feature set that includes a normalized feature data structure may be produced by determining normalized power values for a plurality of frequency bands of each feature vector of the feature history data structure. A signal recognition or modification process may be based, at least in part, on the normalized feature data structure.