Robust neural network acoustic model with side task prediction of reference signals

    公开(公告)号:US10147442B1

    公开(公告)日:2018-12-04

    申请号:US14869803

    申请日:2015-09-29

    Abstract: A neural network acoustic model is trained to be robust and produce accurate output when used to process speech signals having acoustic interference. The neural network acoustic model can be trained using a source-separation process by which, in addition to producing the main acoustic model output for a given input, the neural network generates predictions of the separate speech and interference portions of the input. The parameters of the neural network can be adjusted to jointly optimize all three outputs (e.g., the main acoustic model output, the speech signal prediction, and the interference signal prediction), rather than only optimizing the main acoustic model output. Once trained, output layers for the speech and interference signal predictions can be removed from the neural network or otherwise disabled.

    Audio triggered commands
    4.
    发明授权
    Audio triggered commands 有权
    音频触发命令

    公开(公告)号:US09484030B1

    公开(公告)日:2016-11-01

    申请号:US14956874

    申请日:2015-12-02

    Abstract: A system is configured to execute audio-initiated commands. The system detects audio and determines if a first sound is included in the audio. The system then processes further incoming audio to detect a second sound. If the second sound is not detected within a time threshold, the system executes a command. The command may include delivering a message, outputting audio corresponding to synthesized speech, or some other executable command.

    Abstract translation: 系统被配置为执行音频发起的命令。 系统检测音频并确定音频中是否包含第一个声音。 系统然后处理进一步的输入音频以检测第二声音。 如果在时间阈值内没有检测到第二个声音,则系统执行命令。 该命令可以包括递送消息,输出与合成语音相对应的音频或一些其他可执行命令。

    Neural network based beam selection

    公开(公告)号:US09972339B1

    公开(公告)日:2018-05-15

    申请号:US15228617

    申请日:2016-08-04

    Abstract: A neural network model, such as a deep neural network (DNN), is trained using many speech examples to perform beam selection in a microphone array-based speech processing system. The DNN is trained using many different speech examples that are labeled with position or direction information relative to a training microphone array. The DNN may then be trained to recognize a direction of incoming speech so that at runtime the trained DNN may process input audio data from a microphone array and may output to a beam selector an indicator of the desired beam that may be selected for further processing. The DNN may be configured to output a beam index and/or coordinates (or other position data) corresponding to an estimated location of the detected speech. The DNN may also be configured to output acoustic unit data corresponding to speech units (for example corresponding to phonemes, senons, etc. such as those of a detected wakeword or other word).

    Neural network based beam selection

    公开(公告)号:US10134421B1

    公开(公告)日:2018-11-20

    申请号:US15967185

    申请日:2018-04-30

    Abstract: A neural network model, such as a deep neural network (DNN), is trained using many speech examples to perform beam selection in a microphone array-based speech processing system. The DNN is trained using many different speech examples that are labeled with position or direction information relative to a training microphone array. The DNN may then be trained to recognize a direction of incoming speech so that at runtime the trained DNN may process input audio data from a microphone array and may output to a beam selector an indicator of the desired beam that may be selected for further processing. The DNN may be configured to output a beam index and/or coordinates (or other position data) corresponding to an estimated location of the detected speech. The DNN may also be configured to output acoustic unit data corresponding to speech units (for example corresponding to phonemes, senons, etc. such as those of a detected wakeword or other word).

    User presence detection
    9.
    发明授权

    公开(公告)号:US10121494B1

    公开(公告)日:2018-11-06

    申请号:US15474603

    申请日:2017-03-30

    Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

    ARBITRATION BETWEEN VOICE-ENABLED DEVICES
    10.
    发明申请
    ARBITRATION BETWEEN VOICE-ENABLED DEVICES 有权
    语音启动设备之间的仲裁

    公开(公告)号:US20170076720A1

    公开(公告)日:2017-03-16

    申请号:US14852022

    申请日:2015-09-11

    CPC classification number: G10L15/22 G06F3/167 G10L2015/223 G10L2021/02166

    Abstract: Architectures and techniques for selecting a voice-enabled device to handle audio input that is detected by multiple voice-enabled devices are described herein. In some instances, multiple voice-enabled devices may detect audio input from a user at substantially the same time, due to the voice-enabled devices being located within proximity to the user. The architectures and techniques may analyze a variety of audio signal metric values for the voice-enabled devices to designate a voice-enabled device to handle the audio input.

    Abstract translation: 这里描述了用于选择支持语音的设备来处理由多个支持语音的设备检测到的音频输入的体系结构和技术。 在某些情况下,由于支持语音的设备位于与用户接近的位置,所以多个支持语音的设备可以基本上同时检测来自用户的音频输入。 架构和技术可以分析用于支持语音的设备的各种音频信号度量值,以指定支持语音的设备来处理音频输入。

Patent Agency Ranking