Abstract:
A user interface, methods and article of manufacture each for selecting an audio cue presented in three-dimensional (3D) space are disclosed. The audio cues are audibly perceivable in a space about a user, where each of the audio cues may be perceived by the user as a directional sound at a distinct location from other audio cues in the space. Selection of a specific audio cue is made based on one or more user gestures. A portable electronic device may be configured to present the audio cues perceived by a user and detect certain user gestures to select audio cues. The audio cue selection can be used to control operation of the portable device and/or other associated devices.
Abstract:
In general, techniques are described for grouping audio objects into clusters. In some examples, a device for audio signal processing comprises a cluster analysis module configured to, based on a plurality of audio objects, produce a first grouping of the plurality of audio objects into L clusters, wherein the first grouping is based on spatial information from at least N among the plurality of audio objects and L is less than N. The device also includes an error calculator configured to calculate an error of the first grouping relative to the plurality of audio objects, wherein the error calculator is further configured to, based on the calculated error, produce a plurality L of audio streams according to a second grouping of the plurality of audio objects into L clusters that is different from the first grouping.
Abstract:
Systems, methods, and apparatus for backward-compatible coding of a set of basis function coefficients that describe a sound field are presented.
Abstract:
Systems, methods, and apparatus for pitch trajectory analysis are described. Such techniques may be used to remove vocals and/or vibrato from an audio mixture signal. For example, such a technique may be used to pre-process the signal before an operation to decompose the mixture signal into individual instrument components.
Abstract:
In general, techniques are described for image generation for a collaborative sound system. A headend device comprising a processor may perform these techniques. The processor may be configured to determine a location of a mobile device participating in a collaborative surround sound system as a speaker of a plurality of speakers of the collaborative surround sound system. The processor may further be configured to generate an image that depicts the location of the mobile device that is participating in the collaborative surround sound system relative to the plurality of other speakers of the collaborative surround sound system.
Abstract:
In general, techniques are described for forming a collaborative sound system. A headend device comprising one or more processors may perform the techniques. The processors may be configured to identify mobile devices that each includes a speaker and that are available to participate in a collaborative surround sound system. The processors may configure the collaborative surround sound system to utilize the speaker of each of the mobile devices as one or more virtual speakers of this system and then render audio signals from an audio source such that when the audio signals are played by the speakers of the mobile devices the audio playback of the audio signals appears to originate from the one or more virtual speakers of the collaborative surround sound system. The processors may then transmit the processed audio signals rendered to the mobile device participating in the collaborative surround sound system.
Abstract:
Methods and apparatuses for providing tangible control of sound are provided and described as embodied in a system that includes a sound transducer array along with a touch surface-enabled display table. The array may include a group of transducers (multiple speakers and/or microphones) configured to perform spatial processing of signals for the group of transducers so that sound rendering (in configurations where the array includes multiple speakers), or sound pick-up (in configurations where the array includes multiple microphones), have spatial patterns (or sound projection patterns) that are focused in certain directions while reducing disturbances from other directions. Users may directly adjust parameters related to sound projection patterns by interacting with the touch surface while receiving visual feedback by exercising one or more commands on the touch surface. The commands may be adjusted according to visual feedback received from the change of the display on the touch surface.
Abstract:
Some implementations provide a method for identifying a speaker. The method determines position and orientation of a second device based on data from a first device that is for capturing the position and orientation of the second device. The second device includes several microphones for capturing sound. The second device has movable position and movable orientation. The method assigns an object as a representation of a known user. The object has a moveable position. The method receives a position of the object. The position of the object corresponds to a position of the known user. The method processes the captured sound to identify a sound originating from the direction of the object. The direction of the object is relative to the position and the orientation of the second device. The method identifies the sound originating from the direction of the object as belonging to the known user.
Abstract:
In general, techniques are described for grouping audio objects into clusters. In some examples, a device for audio signal processing comprises a cluster analysis module configured to, based on a plurality of audio objects, produce a first grouping of the plurality of audio objects into L clusters, wherein the first grouping is based on spatial information from at least N among the plurality of audio objects and L is less than N. The device also includes an error calculator configured to calculate an error of the first grouping relative to the plurality of audio objects, wherein the error calculator is further configured to, based on the calculated error, produce a plurality L of audio streams according to a second grouping of the plurality of audio objects into L clusters that is different from the first grouping.