Abstract:
Systems and processes are disclosed for recognizing speech using a weighted finite state transducer (WFST) approach. Dynamic grammars can be supported by constructing the final recognition cascade during runtime using difference grammars. In a first grammar, non-terminals can be replaced with a, weighted phone loop that produces sequences of mono-phone words. In a second grammar, at runtime, non-terminals can be replaced with sub-grammars derived from user-specific usage data including contact, media, and application lists. Interaction frequencies associated with these entities can be used to weight certain words over others. With all non-terminals replaced, a static recognition cascade with the first grammar can be composed with the personalized second grammar to produce a user-specific WEST. User speech can then be processed to generate candidate words having associated probabilities, and the likeliest result can be output.
Abstract:
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
Abstract:
Aspects of subject technology provide systems and methods for simultaneously spell-correcting and completing partial search queries being entered by a user on the user's electronic device. An apparatus such as a computing device may receive partial search queries from the user's electronic device as each character of the partial search query is entered by the user. The apparatus may utilize a machine-learning model to generate suggested queries that include spelling-corrected versions of the received partial query, query completion suggestions for the partial query, and/or spelling-corrected completion suggestions for the partial query.
Abstract:
Systems and processes for operating an electronic device to train a machine-learning translation system are described. In one process, a first set of training data is obtained. The first set of training data includes at least one payload in a first language and a translation of the at least one payload in a second language. The process further includes obtaining one or more templates for adapting the at least one payload; adapting the at least one payload using the one or more templates to generate at least one adapted payload formulated as a translation request; generating a second set of training data based on the at least one adapted payload; and training the machine-learning translation system using the second set of training data.
Abstract:
Systems and processes for processing speech in a digital assistant are provided. In one example process, a first speech input can be received from a user. The first speech input can be processed using a first automatic speech recognition system to produce a first recognition result. An input indicative of a potential error in the first recognition result can be received. The input can be used to improve the first recognition result. For example, the input can include a second speech input that is a repetition of the first speech input. The second speech input can be processed using a second automatic speech recognition system to produce a second recognition result.