Abstract:
Disclosed is a feature extraction and classification methodology wherein audio data is gathered in a target environment under varying conditions. From this collected data, corresponding features are extracted, labeled with appropriate filters (e.g., audio event descriptions), and used for training deep neural networks (DNNs) to extract underlying target audio events from unlabeled training data. Once trained, these DNNs are used to predict underlying events in noisy audio to extract therefrom features that enable the separation of the underlying audio events from the noisy components thereof.
Abstract:
A method for speech restoration by an electronic device is described. The method includes obtaining a noisy speech signal. The method also includes suppressing noise in the noisy speech signal to produce a noise-suppressed speech signal. The noise-suppressed speech signal has a bandwidth that includes at least three subbands. The method further includes iteratively restoring each of the at least three subbands. Each of the at least three subbands is restored based on all previously restored subbands of the at least three subbands.
Abstract:
One example device includes a camera; a display device; a memory; and a processor in communication with the memory to receive audio signals from two or more microphones or a far-end device; receive first location information and second location information, the first location information for a visual identification of an audio source of the received audio signals and the second location information identifying a direction of arrival from the audio source; receive a first adjustment to a first portion of a UI to change either a visual identification or a coordinate direction of a direction focus; in response to the first adjustment, automatically perform a second adjustment to a second portion of the UI to change the other of the visual identification or the coordinate direction of the direction focus; and process the audio signals to filter sounds outside the direction focus, or emphasize sounds within the direction focus.
Abstract:
A method of selectively authorizing access includes obtaining, at an authentication device, first information corresponding to first synthetic biometric data. The method also includes obtaining, at the authentication device, first common synthetic data and second biometric data. The method further includes generating, at the authentication device, second common synthetic data based on the first information and the second biometric data. The method also includes selectively authorizing, by the authentication device, access based on a comparison of the first common synthetic data and the second common synthetic data.
Abstract:
A crosstalk cancellation technique reduces feedback in a shared acoustic space by canceling out some or all parts of sound signals that would otherwise be produced by a loudspeaker to only be captured by a microphone that, recursively, would cause these sounds signals to be reproduced again on the loudspeaker as feedback. Crosstalk cancellation can be used in a multichannel acoustic system (MAS) comprising an arrangement of microphones, loudspeakers, and a processor to together enhance conversational speech between in a shared acoustic space. To achieve crosstalk cancellation, a processor analyzes the inputs of each microphone, compares it to the output of far loudspeaker(s) relative to each such microphone, and cancels out any portion of a sound signal received by the microphone that matches signals that were just produced by the far loudspeaker(s) and sending only the remaining sound signal (if any) to such far loudspeakers.
Abstract:
Systems, devices, and methods are described for recognizing and focusing on at least one source of an audio communication as part of a communication including a video image and an audio communication derived from two or more microphones when a relative position between the microphones is known. In certain embodiments, linked audio and video focus areas providing location information for one or more sound sources may each be associated with different user inputs, and an input to adjust a focus in either the audio or video domain may automatically adjust the focus in the another domain.
Abstract:
Systems, methods, and apparatus for matching pair-wise differences (e.g., phase delay measurements) to an inventory of source direction candidates, and application of pair-wise source direction estimates, are described.
Abstract:
Systems, methods, and apparatus for pitch trajectory analysis are described. Such techniques may be used to remove vocals and/or vibrato from an audio mixture signal. For example, such a technique may be used to pre-process the signal before an operation to decompose the mixture signal into individual instrument components.
Abstract:
Systems, methods, and apparatus are described for applying, based on angles of arrival of source components relative to the axes of different microphone pairs, a spatially directive filter to a multichannel audio signal to produce an output signal.
Abstract:
In general, techniques are described for image generation for a collaborative sound system. A headend device comprising a processor may perform these techniques. The processor may be configured to determine a location of a mobile device participating in a collaborative surround sound system as a speaker of a plurality of speakers of the collaborative surround sound system. The processor may further be configured to generate an image that depicts the location of the mobile device that is participating in the collaborative surround sound system relative to the plurality of other speakers of the collaborative surround sound system.