Abstract:
Described herein is a method for creating an object-based audio signal from an audio input, the audio input including one or more audio channels that are recorded to collectively define an audio scene. The one or more audio channels are captured from a respective one or more spatially separated microphones disposed in a stable spatial configuration. The method includes the steps of: a) receiving the audio input; b) performing spatial analysis on the one or more audio channels to identify one or more audio objects within the audio scene; c) determining contextual information relating to the one or more audio objects; d) defining respective audio streams including audio data relating to at least one of the identified one or more audio objects; and e) outputting an object-based audio signal including the audio streams and the contextual information.
Abstract:
A method for processing audio data, the method comprising: receiving audio data corresponding to a plurality of instances of audio, including at least one of: (a) audio data from multiple endpoints, recorded separately or (b) audio data from a single endpoint corresponding to multiple talkers and including spatial information for each of the multiple talkers; rendering the audio data in a virtual acoustic space such that each of the instances of audio has a respective different virtual position in the virtual acoustic space; and scheduling the instances of audio to be played back with a playback overlap between at least two of the instances of audio, wherein the scheduling is performed, at least in part, according to a set of perceptually-motivated rules.
Abstract:
The present document relates to audio conference systems. In particular, the present document relates to improving the perceptual continuity within an audio conference system. According to an aspect, a method for multiplexing first and second continuous input audio signals is described, to yield a multiplexed output audio signal which is to be rendered to a listener. The first and second input audio signals (123) are indicative of sounds captured by a first and a second endpoint (120, 170), respectively. The method comprises determining a talk activity (201, 202) in the first and second input audio signals (123), respectively; and determining the multiplexed output audio signal based on the first and/or second input audio signals (123) and subject to one or more multiplexing conditions. The one or more multiplexing conditions comprise: at a time instant, when there is talk activity (201) in the first input audio signal (123), determining the multiplexed output audio signal at least based on the first input audio signal (123); at a time instant, when there is talk activity (202) in the second input audio signal (123), determining the multiplexed output audio signal at least based on the second input audio signal (123); and at a silence time instant, when there is no talk activity (201, 202) in the first and in the second input audio signals (123), determining the multiplexed output audio signal based on only one of the first and second input audio signals (123).
Abstract:
In some embodiments, a method for modifying noise captured at endpoints of a teleconferencing system, including steps of capturing noise at each endpoint, and modifying the captured noise to generate modified noise having a frequency-amplitude spectrum which matches a target spectrum and a spatial property set which matches a target spatial property set. In other embodiments, a teleconferencing method including steps of: at endpoints of a teleconferencing system, determining audio frames indicative of audio captured at each endpoint, each of a subset of the frames indicative of noise but not a significant level of speech; and at each endpoint, generating modified frames indicative of modified noise having a frequency-amplitude spectrum which matches a target spectrum and a spatial property set which matches a target spatial property set, and generating encoded audio including by encoding the modified frames. Other aspects are systems configured to perform any embodiment of the method.
Abstract:
A conferencing server (100) receives incoming bitstreams (I1, I2, I3, I4, I5) carrying media data from respective conferencing endpoints (110, 120, 130, 140, 150); receives a mixing strategy (M) specifying properties of at least one outgoing bitstream (O1, O2, O3, O4, O5) and requiring at least one additive media mixing step; and supplies at least one outgoing bitstream by executing, in a processor (103) and a memory (102) with a plurality of memory spaces, a run list of operations selected from a predefined collection of primitives and realizing the received mixing strategy. A pre-processor (104) in the server derives said run list repeatedly and dynamically while taking into consideration determined momentary activity in each incoming bitstream. In embodiments, the run list may be derived by (a) pruning of an initial run list, (b) constrained or non-constrained minimization of a cost function, or (c) automatic code generation.
Abstract:
An apparatus and method relating to use of a physical writing surface (132) during a videoconference or presentation. Snapshots of a whiteboard (132) are identified by applying a difference measure to the video data (e.g., as a way of comparing frames at different times). Audio captured by a microphone may be processed to generate textual data, wherein a portion of the textual data is associated with each snapshot. The writing surface may be identified (enrolled) using gestures. Image processing techniques may be used to transform views of a writing surface.
Abstract:
Some noise compensation methods involve receiving microphone signals corresponding to ambient noise from a noise source location in or near an audio environment, determining or estimating a listener position in the audio environment and estimating at least one critical distance, which is a distance from the noise source location at which directly propagated sound pressure is equal to diffuse field sound pressure. Some examples involve estimating whether the listener position is within the at least one critical distance and implementing a noise compensation method for the ambient noise based, at least in part, on an estimate of whether the listener position is within the critical distance.
Abstract:
Teleconference audio data including a plurality of individual uplink data packet streams, may be received during a teleconference. Each uplink data packet stream may corresponding to a telephone endpoint used by one or more teleconference participants. The teleconference audio data may be analyzed to determine a plurality of suppressive gain coefficients, which may be applied to first instances of the teleconference audio data during the teleconference, to produce first gain-suppressed audio data provided to the telephone endpoints during the teleconference. Second instances of the teleconference audio data, as well as gain coefficient data corresponding to the plurality of suppressive gain coefficients, may be sent to a memory system as individual uplink data packet streams. The second instances of the teleconference audio data may be less gain-suppressed than the first gain-suppressed audio data.
Abstract:
A computer implemented system for rendering captured audio soundfields to a listener comprises apparatus to deliver the audio soundfields to the listener. The delivery apparatus delivers the audio soundfields to the listener with first and second audio elements perceived by the listener as emanating from first and second virtual source locations, respectively, and with the first audio element and/or the second audio element delivered to the listener from a third virtual source location. The first virtual source location and the second virtual source location are perceived by the listener as being located to the front of the listener, and the third virtual source location is located to the rear or the side of the listener.
Abstract:
Embodiments are described for a soundfield system that receives a transmitting soundfield, wherein the transmitting soundfield includes a sound source at a location in the transmitting soundfield. The system determines a rotation angle for rotating the transmitting soundfield based on a desired location for the sound source. The transmitting soundfield is rotated by the determined angle and the system obtains a listener's soundfield based on the rotated transmitting soundfield. The listener's soundfield is transmitted for rendering to a listener.