Endpointing in speech processing
    1.
    发明授权

    公开(公告)号:US12211517B1

    公开(公告)日:2025-01-28

    申请号:US17475699

    申请日:2021-09-15

    Abstract: A speech-processing system may determine potential endpoints in a user's speech. Such endpoint prediction may include determining a potential endpoint in a stream of audio data, and may additionally including determining an endpoint score representing a likelihood that the potential endpoint represents an end of speech representing a complete user input. When the potential endpoint has been determined, the system may publish a transcript of speech that preceded the potential endpoint, and send it to downstream components. The system may continue to transcribe audio data and determine additional potential endpoints while the downstream components process the transcript. The downstream components may determine whether the transcript is complete; e.g., represents the entirety of the user input. Final endpoint determinations may be made based on the results of the downstream processing including automatic speech recognition, natural language understanding, etc.

    Generation of automated message responses

    公开(公告)号:US11496582B2

    公开(公告)日:2022-11-08

    申请号:US16455604

    申请日:2019-06-27

    Abstract: Systems, methods, and devices for computer-generating responses and sending responses to communications when the recipient of the communication is unavailable are disclosed. An individual may send a message (either audio or text) to a recipient. The recipient may be unavailable to contemporaneously respond to the message (e.g., the recipient may be performing an action that makes is difficult or impractical for the recipient to contemporaneously respond to the audio message). When the recipient is unavailable, a response to the message is generated and sent without receiving an instruction from the recipient to do so. The response may be sent to the message originating individual, and content of the response may thereafter be sent to the recipient to receive feedback regarding the correctness of the response. Alternatively, the response content may first be sent to the recipient to receive the feedback, and thereafter the response may be sent to the message originating individual.

    Language model adaptation
    4.
    发明授权

    公开(公告)号:US11302310B1

    公开(公告)日:2022-04-12

    申请号:US16426557

    申请日:2019-05-30

    Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

    Compressed finite state transducers for automatic speech recognition

    公开(公告)号:US10381000B1

    公开(公告)日:2019-08-13

    申请号:US15864689

    申请日:2018-01-08

    Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.

    Speech processing techniques
    10.
    发明授权

    公开(公告)号:US12205574B1

    公开(公告)日:2025-01-21

    申请号:US17208615

    申请日:2021-03-22

    Abstract: Techniques for using multiple machine learning (ML) models, with varying compute costs, for ASR processing is described. The system may include an arbitrator component configured to determine which ML model is to be used to process an audio frame from a sequence of audio frames representing a spoken natural language input. The arbitrator component may switch between the ML models, on a frame-by-frame basis, to reduce an overall compute cost for the entire spoken natural language input. The outputs of the different ML models may be combined to determine the final output for the entire spoken natural language input.

Patent Agency Ranking