Abstract:
A device includes one or more processors configured to receive sensor data from one or more sensor devices. The one or more processors are also configured to determine a context of the device based on the sensor data. The one or more processors are further configured to select a model based on the context. The one or more processors are also configured to process an input signal using the model to generate a context-specific output.
Abstract:
A device to provide information to a visual interface that is mountable to a vehicle dashboard includes a memory configured to store device information indicative of controllable devices of a building and occupant data indicative of one or more occupants of the building. The device includes a processor configured to receive, in real-time, status information associated with the one or more occupants of the building. The status information includes at least one of dynamic location information or dynamic activity information. The processor is configured to generate an output to provide, at the visual interface device, a visual representation of at least a portion of the building and the status information associated with the one or more occupants. The processor is also configured to generate an instruction to adjust an operation of one or more devices of the controllable devices based on user input.
Abstract:
A device includes a memory, a receiver, a processor, and a display. The memory is configured to store a speaker model. The receiver is configured to receive an input audio signal. The processor is configured to determine a first confidence level associated with a first portion of the input audio signal based on the speaker model. The processor is also configured to determine a second confidence level associated with a second portion of the input audio signal based on the speaker model. The display is configured to present a graphical user interface associated with the first confidence level or associated with the second confidence level.
Abstract:
An apparatus includes a processor configured to receive one or more media signals associated with a scene. The processor is also configured to identify a spatial location in the scene for each source of the one or more media signals. The processor is further configured to identify audio content for each media signal of the one or more media signals. The processor is also configured to determine one or more candidate spatial locations in the scene based on the identified spatial locations. The processor is further configured to generate audio to playback as virtual sounds that originate from the one or more candidate spatial locations.
Abstract:
Disclosed is a feature extraction and classification methodology wherein audio data is gathered in a target environment under varying conditions. From this collected data, corresponding features are extracted, labeled with appropriate filters (e.g., audio event descriptions), and used for training deep neural networks (DNNs) to extract underlying target audio events from unlabeled training data. Once trained, these DNNs are used to predict underlying events in noisy audio to extract therefrom features that enable the separation of the underlying audio events from the noisy components thereof.
Abstract:
A device includes a memory configured to store category labels associated with categories of a natural language processing library. A processor is configured to analyze input audio data to generate a text string and to perform natural language processing on at least the text string to generate an output text string including an action associated with a first device, a speaker, a location, or a combination thereof. The processor is configured to compare the input audio data to audio data of the categories to determine whether the input audio data matches any of the categories and, in response to determining that the input audio data does not match any of the categories: create a new category label, associate the new category label with at least a portion of the output text string, update the categories with the new category label, and generate a notification indicating the new category label.
Abstract:
A device includes a memory configured to store category labels associated with categories of a natural language processing library. A processor is configured to analyze input audio data to generate a text string and to perform natural language processing on at least the text string to generate an output text string including an action associated with a first device, a speaker, a location, or a combination thereof. The processor is configured to compare the input audio data to audio data of the categories to determine whether the input audio data matches any of the categories and, in response to determining that the input audio data does not match any of the categories: create a new category label, associate the new category label with at least a portion of the output text string, update the categories with the new category label, and generate a notification indicating the new category label.
Abstract:
In a particular aspect, an apparatus includes an audio sensor configured to receive an input audio signal. The apparatus also includes speech generative circuitry configured to generate a synthesized audio signal based at least partly on automatic speech recognition (ASR) data associated with the input audio signal and based on one or more parameters indicative of state information associated with the input audio signal.
Abstract:
Disclosed is a feature extraction and classification methodology wherein audio data is gathered in a target environment under varying conditions. From this collected data, corresponding features are extracted, labeled with appropriate filters (e.g., audio event descriptions), and used for training deep neural networks (DNNs) to extract underlying target audio events from unlabeled training data. Once trained, these DNNs are used to predict underlying events in noisy audio to extract therefrom features that enable the separation of the underlying audio events from the noisy components thereof.
Abstract:
A method for speech restoration by an electronic device is described. The method includes obtaining a noisy speech signal. The method also includes suppressing noise in the noisy speech signal to produce a noise-suppressed speech signal. The noise-suppressed speech signal has a bandwidth that includes at least three subbands. The method further includes iteratively restoring each of the at least three subbands. Each of the at least three subbands is restored based on all previously restored subbands of the at least three subbands.