Abstract:
Disclosed is an application interface that takes into account the user's gaze direction relative to who is speaking in an interactive multi-participant environment where audio-based contextual information and/or visual-based semantic information is being presented. Among these various implementations, two different types of microphone array devices (MADs) may be used. The first type of MAD is a steerable microphone array (a.k.a. a steerable array) which is worn by a user in a known orientation with regard to the user's eyes, and wherein multiple users may each wear a steerable array. The second type of MAD is a fixed-location microphone array (a.k.a. a fixed array) which is placed in the same acoustic space as the users (one or more of which are using steerable arrays).
Abstract:
A system includes a memory configured to store data associated with a service that is available. The system also includes a microphone associated with an acoustic space and configured to receive an audio input produced by a person. The system further includes a sensor located within the acoustic space and configured to detect vibrations produced by the person. The system includes a processor coupled to the memory, to the microphone, and to the sensor. The processor is configured to conditionally authorize execution of the service requested by the person, the service conditionally authorized based on the audio input and the vibrations.
Abstract:
A system includes a feedback filter configured to receive an audio signal from a microphone and to process the audio signal based on a feedback path between a loudspeaker and the microphone. The system also includes an adaptive filter coupled to the feedback filter. The system further includes a signal combiner coupled to the microphone and to the adaptive filter.
Abstract:
Disclosed is a feature extraction and classification methodology wherein audio data is gathered in a target environment under varying conditions. From this collected data, corresponding features are extracted, labeled with appropriate filters (e.g., audio event descriptions), and used for training deep neural networks (DNNs) to extract underlying target audio events from unlabeled training data. Once trained, these DNNs are used to predict underlying events in noisy audio to extract therefrom features that enable the separation of the underlying audio events from the noisy components thereof.
Abstract:
A method for speech restoration by an electronic device is described. The method includes obtaining a noisy speech signal. The method also includes suppressing noise in the noisy speech signal to produce a noise-suppressed speech signal. The noise-suppressed speech signal has a bandwidth that includes at least three subbands. The method further includes iteratively restoring each of the at least three subbands. Each of the at least three subbands is restored based on all previously restored subbands of the at least three subbands.
Abstract:
One example device includes a camera; a display device; a memory; and a processor in communication with the memory to receive audio signals from two or more microphones or a far-end device; receive first location information and second location information, the first location information for a visual identification of an audio source of the received audio signals and the second location information identifying a direction of arrival from the audio source; receive a first adjustment to a first portion of a UI to change either a visual identification or a coordinate direction of a direction focus; in response to the first adjustment, automatically perform a second adjustment to a second portion of the UI to change the other of the visual identification or the coordinate direction of the direction focus; and process the audio signals to filter sounds outside the direction focus, or emphasize sounds within the direction focus.
Abstract:
A method of selectively authorizing access includes obtaining, at an authentication device, first information corresponding to first synthetic biometric data. The method also includes obtaining, at the authentication device, first common synthetic data and second biometric data. The method further includes generating, at the authentication device, second common synthetic data based on the first information and the second biometric data. The method also includes selectively authorizing, by the authentication device, access based on a comparison of the first common synthetic data and the second common synthetic data.
Abstract:
A crosstalk cancellation technique reduces feedback in a shared acoustic space by canceling out some or all parts of sound signals that would otherwise be produced by a loudspeaker to only be captured by a microphone that, recursively, would cause these sounds signals to be reproduced again on the loudspeaker as feedback. Crosstalk cancellation can be used in a multichannel acoustic system (MAS) comprising an arrangement of microphones, loudspeakers, and a processor to together enhance conversational speech between in a shared acoustic space. To achieve crosstalk cancellation, a processor analyzes the inputs of each microphone, compares it to the output of far loudspeaker(s) relative to each such microphone, and cancels out any portion of a sound signal received by the microphone that matches signals that were just produced by the far loudspeaker(s) and sending only the remaining sound signal (if any) to such far loudspeakers.
Abstract:
Systems, devices, and methods are described for recognizing and focusing on at least one source of an audio communication as part of a communication including a video image and an audio communication derived from two or more microphones when a relative position between the microphones is known. In certain embodiments, linked audio and video focus areas providing location information for one or more sound sources may each be associated with different user inputs, and an input to adjust a focus in either the audio or video domain may automatically adjust the focus in the another domain.
Abstract:
Systems, methods, and apparatus for matching pair-wise differences (e.g., phase delay measurements) to an inventory of source direction candidates, and application of pair-wise source direction estimates, are described.