Compressed finite state transducers for automatic speech recognition

    公开(公告)号:US10381000B1

    公开(公告)日:2019-08-13

    申请号:US15864689

    申请日:2018-01-08

    Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.

    Adaptive beam pruning for automatic speech recognition

    公开(公告)号:US10199037B1

    公开(公告)日:2019-02-05

    申请号:US15196184

    申请日:2016-06-29

    Abstract: A reduced latency system for automatic speech recognition (ASR). The system can use certain feature values describing the state of ASR processing to estimate how far a lowest scoring node for an audio frame is from a potential node likely be part of the Viterbi path. The system can then adjust its beam width in a manner likely to encompass the node likely to be on the Viterbi path, thus pruning unnecessary nodes and reducing latency. The feature values and estimated distances may be based on a set of training data, where the system identifies specific nodes on the Viterbi path and determines what feature values correspond to what desired beam widths. Trained models or other data may be created at training and used at runtime to dynamically adjust the beam width, as well as other settings such as threshold number of active nodes.

    Dynamic arc weights in speech recognition models

    公开(公告)号:US10140981B1

    公开(公告)日:2018-11-27

    申请号:US14301245

    申请日:2014-06-10

    Abstract: Features are disclosed for performing speech recognition on utterances using dynamic weights with speech recognition models. An automatic speech recognition system may use a general speech recognition model, such a large finite state transducer-based language model, to generate speech recognition results for various utterances. The general speech recognition model may include sub-models or other portions that are customized for particular tasks, such as speech recognition on utterances regarding particular topics. Individual weights within the general speech recognition model can be dynamically replaced based on the context in which an utterance is made or received, thereby providing a further degree of customization without requiring additional speech recognition models to generated, maintained, or loaded.

Patent Agency Ranking