-
公开(公告)号:US10381000B1
公开(公告)日:2019-08-13
申请号:US15864689
申请日:2018-01-08
Applicant: Amazon Technologies, Inc.
Inventor: Denis Sergeyevich Filimonov , Gautam Tiwari , Shaun Nidhiri Joseph , Ariya Rastrow
IPC: G10L15/00 , G10L15/193 , G10L15/18 , G10L15/06 , G10L15/02
Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.
-
公开(公告)号:US10013974B1
公开(公告)日:2018-07-03
申请号:US15187177
申请日:2016-06-20
Applicant: Amazon Technologies, Inc.
Inventor: Denis Sergeyevich Filimonov , Gautam Tiwari , Shaun Nidhiri Joseph , Ariya Rastrow
IPC: G10L15/19 , G10L15/193 , G10L15/06 , G10L15/02 , G10L15/18
CPC classification number: G10L15/193 , G10L15/02 , G10L15/063 , G10L15/1815 , G10L15/1822 , G10L2015/0635
Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.
-
公开(公告)号:US10121467B1
公开(公告)日:2018-11-06
申请号:US15197923
申请日:2016-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Ankur Gandhe , Denis Sergeyevich Filimonov , Ariya Rastrow , Björn Hoffmeister
IPC: G10L15/06 , G10L15/183 , G10L15/16 , G10L15/197
Abstract: A language model for automatic speech processing, such as a finite state transducer (FST) may be configured to incorporate information about how a particular word sequence (N-gram) may be used in a similar manner from another N-gram. A score of a component of the FST (such as an arc or state) relating to the first N-gram may be based on information of the second N-gram. Further, the FST may be configured to have an arc between a state of the first N-gram and a state of the second N-gram to allow for cross N-gram back off, rather than backoff from a larger N-gram to a smaller N-gram during traversal of the FST during speech processing.
-
公开(公告)号:US09865254B1
公开(公告)日:2018-01-09
申请号:US15187102
申请日:2016-06-20
Applicant: Amazon Technologies, Inc.
Inventor: Denis Sergeyevich Filimonov , Gautam Tiwari , Shaun Nidhiri Joseph , Ariya Rastrow
IPC: G10L15/00 , G10L15/193 , G10L15/06 , G10L15/02 , G10L15/18
CPC classification number: G10L15/193 , G10L15/02 , G10L15/063 , G10L15/1815 , G10L15/1822 , G10L2015/0635
Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.
-
公开(公告)号:US10199037B1
公开(公告)日:2019-02-05
申请号:US15196184
申请日:2016-06-29
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Denis Sergeyevich Filimonov , Yuan Shangguan
Abstract: A reduced latency system for automatic speech recognition (ASR). The system can use certain feature values describing the state of ASR processing to estimate how far a lowest scoring node for an audio frame is from a potential node likely be part of the Viterbi path. The system can then adjust its beam width in a manner likely to encompass the node likely to be on the Viterbi path, thus pruning unnecessary nodes and reducing latency. The feature values and estimated distances may be based on a set of training data, where the system identifies specific nodes on the Viterbi path and determines what feature values correspond to what desired beam widths. Trained models or other data may be created at training and used at runtime to dynamically adjust the beam width, as well as other settings such as threshold number of active nodes.
-
公开(公告)号:US10140981B1
公开(公告)日:2018-11-27
申请号:US14301245
申请日:2014-06-10
Applicant: Amazon Technologies, Inc.
Inventor: Denis Sergeyevich Filimonov , Ariya Rastrow
IPC: G10L15/183 , G10L15/16
Abstract: Features are disclosed for performing speech recognition on utterances using dynamic weights with speech recognition models. An automatic speech recognition system may use a general speech recognition model, such a large finite state transducer-based language model, to generate speech recognition results for various utterances. The general speech recognition model may include sub-models or other portions that are customized for particular tasks, such as speech recognition on utterances regarding particular topics. Individual weights within the general speech recognition model can be dynamically replaced based on the context in which an utterance is made or received, thereby providing a further degree of customization without requiring additional speech recognition models to generated, maintained, or loaded.
-
公开(公告)号:US10032463B1
公开(公告)日:2018-07-24
申请号:US14982587
申请日:2015-12-29
Applicant: Amazon Technologies, Inc.
Inventor: Ariya Rastrow , Nikko Ström , Spyridon Matsoukas , Markus Dreyer , Ankur Gandhe , Denis Sergeyevich Filimonov , Julian Chan , Rohit Prasad
IPC: G10L15/183 , G10L15/197 , G10L15/16 , G10L25/30 , G10L15/26 , G10L15/06 , G10L15/22
Abstract: An automatic speech recognition (“ASR”) system produces, for particular users, customized speech recognition results by using data regarding prior interactions of the users with the system. A portion of the ASR system (e.g., a neural-network-based language model) can be trained to produce an encoded representation of a user's interactions with the system based on, e.g., transcriptions of prior utterances made by the user. This user-specific encoded representation of interaction history is then used by the language model to customize ASR processing for the user.
-
-
-
-
-
-