Self-supervised federated learning

    公开(公告)号:US12039998B1

    公开(公告)日:2024-07-16

    申请号:US17665129

    申请日:2022-02-04

    CPC classification number: G10L25/78 G06N3/045 G06N3/08 G10L25/21

    Abstract: An acoustic event detection system may employ self-supervised federated learning to update encoder and/or classifier machine learning models. In an example operation, an encoder may be pre-trained to extract audio feature data from an audio signal. A decoder may be pre-trained to predict a subsequent portion of audio data (e.g., a subsequent frame of audio data represented by log filterbank energies). The encoder and decoder may be trained using self-supervised learning to improve the decoder's predictions and, by extension, the quality of the audio feature data generated by the encoder. The system may apply federated learning to share encoder updates across user devices. The system may fine-tune the classifier to improve inferences based on the improved audio feature data. The system may distribute classifier updates to the user device(s) to update the on-device classifier.

    Streaming self-attention in a neural network

    公开(公告)号:US11961514B1

    公开(公告)日:2024-04-16

    申请号:US17547610

    申请日:2021-12-10

    CPC classification number: G10L15/16

    Abstract: An acoustic event detection system may employ one or more recurrent neural networks (RNNs) to extract features from audio data, and use the extracted features to determine the presence of an acoustic event. The system may use self-attention to emphasize features extracted from portions of audio data that may include features more useful for detecting acoustic events. The system may perform self-attention in an iterative manner to reduce the amount of memory used to store hidden states of the RNN while processing successive portions of the audio data. The system may process the portions of the audio data using the RNN to generate a hidden state for each portion. The system may calculate an interim embedding for each hidden state. An interim embedding calculated for the last hidden state may be normalized to determine a final embedding representing features extracted from the input data by the RNN.

    Wakeword and acoustic event detection

    公开(公告)号:US11670299B2

    公开(公告)日:2023-06-06

    申请号:US17321999

    申请日:2021-05-17

    CPC classification number: G10L15/22 G10L15/16

    Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.

    WAKEWORD AND ACOUSTIC EVENT DETECTION

    公开(公告)号:US20210358497A1

    公开(公告)日:2021-11-18

    申请号:US17321999

    申请日:2021-05-17

    Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.

    Media presence detection
    8.
    发明授权

    公开(公告)号:US11069352B1

    公开(公告)日:2021-07-20

    申请号:US16278440

    申请日:2019-02-18

    Abstract: Described herein is a system for media presence detection in audio. The system analyzes audio data to recognize whether a given audio segment contains sounds from a media source as a way of differentiating recorded media source sounds from other live sounds. In exemplary embodiments, the system includes a hierarchical model architecture for processing audio data segments, where individual audio data segments are processed by a trained machine learning model operating locally, and another trained machine learning model provides historical and contextual information to determine a score indicating the likelihood that the audio data segment contains sounds from a media source.

Patent Agency Ranking