Event-based semantic search and retrieval
    36.
    发明公开

    公开(公告)号:US20240233715A1

    公开(公告)日:2024-07-11

    申请号:US18118282

    申请日:2023-03-07

    申请人: Drift.com, Inc.

    摘要: A technique for semantic search and retrieval that is event-based, wherein is event is composed of a sequence of observations that are user speech or physical actions. Using a first set of conversations, a machine learning model is trained against groupings of utterances therein to generate a speech act classifier. Observation sequences therein are organized into groupings of events and configured for subsequent event recognition. A set of second (unannotated) conversations are then received. The set of second conversations is evaluated using the speech act classifier and information retrieved from the event recognition to generate event-level metadata that comprises, for each utterance or physical action within an event, one or more associated tags. In response to a query, a search is performed against the metadata. Because the metadata is derived from event recognition, the search is performed against events learned from the set of first conversations. One or more conversation fragments that, from an event-based perspective, are semantically-relevant to the query, are returned.

    Noise robust representations for keyword spotting systems

    公开(公告)号:US12027156B2

    公开(公告)日:2024-07-02

    申请号:US17677921

    申请日:2022-02-22

    摘要: Described are techniques for noise-robust and speaker-independent keyword spotting (KWS) in an input audio signal that contains keywords used to activate voice-based human-computer interactions. A KWS system may combine the latent representation generated by a denoising autoencoder (DAE) with audio features extracted from the audio signal using a machine learning approach. The DAE may be a discriminative DAE trained with a quadruplet loss metric learning approach to create a highly-separable latent representation of the audio signal in the audio input feature space. In one aspect, spectral characteristics of the audio signal such as Log-Mel features are combined with the latent representation generated by a quadruplet loss variational DAE (QVDQE) as input to a DNN KWS classifier. The KWS system improves keyword classification accuracy versus using extracted spectral features alone, non-discriminative DAE latent representations alone, or the extracted spectral features combined with the non-discriminative DAE latent representations in a KWS classifier.