-
公开(公告)号:US20240233732A1
公开(公告)日:2024-07-11
申请号:US18615621
申请日:2024-03-25
Applicant: GOOGLE LLC
Inventor: Benjamin Haynor , Petar Aleksic
IPC: G10L15/26 , G10L15/16 , G10L15/193 , G10L15/22 , G10L15/30
CPC classification number: G10L15/26 , G10L15/16 , G10L15/193 , G10L15/22 , G10L15/30
Abstract: Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data. Additional or alternative implementations include modifying probabilities of one or more candidate recognitions of the alphanumeric sequence using the contextual biasing FST.
-
公开(公告)号:US11869491B2
公开(公告)日:2024-01-09
申请号:US17425696
申请日:2020-01-16
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Tsutomu Hirao , Atsunori Ogawa , Tomohiro Nakatani , Masaaki Nagata
IPC: G10L15/193 , G10L15/08 , G10L15/22
CPC classification number: G10L15/193 , G10L15/083 , G10L15/22
Abstract: A speech recognition unit converts an input utterance sequence into a confusion network sequence constituted by a k-best of candidate words of speech recognition results; a lattice generating unit generates a lattice sequence having the candidate words as internal nodes and a combination of k words among the candidate words for an identical speech as an external node, in which edges are extended between internal nodes other than internal nodes included in an identical external node, from the confusion network sequence; an integer programming problem generating unit generates an integer programming problem for selecting a path that maximizes an objective function including at least a coverage score of an important word, of paths following the internal nodes with the edges extended, in the lattice sequence; and the summary generating unit generates a high-quality summary having less speech recognition errors and low redundancy using candidate words indicated by the internal nodes included in the path selected by solving the integer programming problem, under a constraint on the length of a summary to be generated.
-
公开(公告)号:US11676585B1
公开(公告)日:2023-06-13
申请号:US17401141
申请日:2021-08-12
Applicant: Amazon Technologies, Inc.
Inventor: Pushkaraksha Gejji
IPC: G10L15/00 , G10L15/18 , G10L15/02 , G10L15/14 , G10L15/19 , G10L15/16 , G10L15/193 , G06F40/289 , G10L15/08
CPC classification number: G10L15/1822 , G10L15/02 , G10L15/14 , G10L15/19 , G06F40/289 , G10L15/083 , G10L15/16 , G10L15/18 , G10L15/193
Abstract: Embodiments describe a method for decoding speech including receiving speech input at an audio input device, generating speech data that is a digital representation of the speech input; extracting acoustic features of the speech data, assigning acoustic scores to the acoustic features, receiving data representing the acoustic features and the acoustic scores, decoding the data representing the acoustic features into a word, having a word score, by referencing a WFST language model, modifying the word score into a new word score based on a personalized grammar model stored in the external memory device, the processor is separate from and external to the WFST accelerator, and determining an intent represented by a plurality of words outputted by the WFST accelerator, where the plurality of words include the word and the new word score.
-
公开(公告)号:US20230154458A1
公开(公告)日:2023-05-18
申请号:US18098206
申请日:2023-01-18
Applicant: VeriSign, Inc.
Inventor: Andrew FREGLY , Burton S. KALISKI, JR. , Swapneel SHETH
IPC: G10L15/18 , G06F3/16 , G10L15/193 , G06F16/955
CPC classification number: G10L15/1822 , G06F3/167 , G10L15/193 , G06F16/955
Abstract: In one embodiment, a domain-name based framework implemented in a digital assistant ecosystem uses domain names as unique identifiers for request types, requesting entities, responders, and target entities embedded in a natural language request. Further, the framework enables interpreting natural language requests according to domain ontologies associated with different responders. A domain ontology operates as a keyword dictionary for a given responder and defines the keywords and corresponding allowable values to be used for request types and request parameters. The domain-name based framework thus enables the digital assistant to interact with any responder that supports a domain ontology to generate precise and complete responses to natural language based requests.
-
公开(公告)号:US20180293977A1
公开(公告)日:2018-10-11
申请号:US15483977
申请日:2017-04-10
Applicant: Microsoft Technology Licensing, LLC
Inventor: Christian Liensberger
IPC: G10L15/187 , G06F17/27 , G10L15/06 , G10L15/18 , G10L15/22
CPC classification number: G10L15/22 , G06F17/2775 , G10L15/183 , G10L15/193 , G10L2015/228
Abstract: Techniques and systems are disclosed for context-dependent speech recognition. The techniques and systems described enable accurate recognition of speech by accessing sub-libraries associated with the context of the speech to be recognized. These techniques translate audible input into audio data at a smart device and determine context for the speech, such as location-based, temporal-based, recipient-based, and application based context. The smart device then accesses a context-dependent library to compare the audio data with phrase-associated translation data in one or more sub-libraries of the context-dependent library to determine a match. In this way, the techniques allow access to a large quantity of phrases while reducing incorrect matching of the audio data to translation data caused by organizing the phrases into context-dependent sub-libraries.
-
公开(公告)号:US10013974B1
公开(公告)日:2018-07-03
申请号:US15187177
申请日:2016-06-20
Applicant: Amazon Technologies, Inc.
Inventor: Denis Sergeyevich Filimonov , Gautam Tiwari , Shaun Nidhiri Joseph , Ariya Rastrow
IPC: G10L15/19 , G10L15/193 , G10L15/06 , G10L15/02 , G10L15/18
CPC classification number: G10L15/193 , G10L15/02 , G10L15/063 , G10L15/1815 , G10L15/1822 , G10L2015/0635
Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.
-
公开(公告)号:US09934777B1
公开(公告)日:2018-04-03
申请号:US15248211
申请日:2016-08-26
Applicant: Amazon Technologies, Inc.
Inventor: Shaun Nidhiri Joseph , Sonal Pareek , Ariya Rastrow , Gautam Tiwari , Alexander David Rosen
CPC classification number: G10L15/063 , G10L15/02 , G10L15/08 , G10L15/1815 , G10L15/193 , G10L15/22 , G10L15/30 , G10L2015/025 , G10L2015/0635
Abstract: User-specific language models (LMs) that include internal word indexes to a word table specific to the user-specific LM rather than a word table specific to a system-wide LM. When the system-wide LM is updated, the word table of the user-specific LM may be updated to translate the user-specific indices to system-wide indices. This prevents having to update the internal indices of the user-specific LM every time the system-wide LM is updated.
-
公开(公告)号:US20170270925A1
公开(公告)日:2017-09-21
申请号:US15477179
申请日:2017-04-03
Applicant: VoiceBox Technologies Corporation
Inventor: Michael R. KENNEWICK , Catherine CHEUNG , Larry BALDWIN , Ari SALOMON , Michael TJALVE , Sheetal GUTTIGOLI , Lynn ARMSTRONG , Philippe DI CRISTO , Bernie ZIMMERMAN , Sam MENAKER
CPC classification number: G10L15/22 , G01C21/3608 , G06Q30/0261 , G10L15/00 , G10L15/04 , G10L15/08 , G10L15/19 , G10L15/193
Abstract: A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment.
-
公开(公告)号:US09674328B2
公开(公告)日:2017-06-06
申请号:US13402678
申请日:2012-02-22
Applicant: Ajay Juneja
Inventor: Ajay Juneja
CPC classification number: G10L15/22 , G10L15/005 , G10L15/02 , G10L15/04 , G10L15/08 , G10L15/193 , G10L15/30 , G10L15/32 , G10L25/00 , G10L25/60 , H04L43/08 , H04L43/0894 , H04L67/327 , H04L67/42 , H04M1/271 , H04M1/7253 , H04M1/72552 , H04M2250/14 , H04M2250/74
Abstract: A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device.
-
公开(公告)号:US09620113B2
公开(公告)日:2017-04-11
申请号:US14269545
申请日:2014-05-05
Applicant: VoiceBox Technologies Corporation
Inventor: Michael R. Kennewick , Catherine Cheung , Larry Baldwin , Ari Salomon , Michael Tjalve , Sheetal Guttigoli , Lynn Armstrong , Philippe Di Cristo , Bernie Zimmerman , Sam Menaker
CPC classification number: G10L15/22 , G01C21/3608 , G06Q30/0261 , G10L15/00 , G10L15/04 , G10L15/08 , G10L15/19 , G10L15/193
Abstract: A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment.
-
-
-
-
-
-
-
-
-