Abstract:
Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition. User interfaces for providing the handwriting input functionality are also disclosed.
Abstract:
Systems and processes for natural language processing are provided. In accordance with one example, a method includes, at a first electronic device with one or more processors and memory, receiving a plurality of words, mapping each of the plurality of words to a word representation, and associating the mapped words to provide a plurality of phrases. In some examples, each of the plurality of phrases has a representation of a first type. The method further includes encoding each of the plurality of phrases to provide a respective plurality of encoded phrases. In some examples, each of the plurality of encoded phrases has a representation of a second type different than the first type. The method further includes determining a value of each of the plurality of encoded phrases and identifying one or more phrases of the plurality of encoded phrases having a value exceeding a threshold.
Abstract:
Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition. User interfaces for providing the handwriting input functionality are also disclosed.
Abstract:
Systems and processes for prediction using generative adversarial network and distillation technology are provided. For example, an input is received at a first portion of a language model. A first output distribution is obtained, based on the input, from the first portion of the language model. Using a first training model, the language model is adjusted based on the first output distribution. The first output distribution is received at a second portion of the language model. A first representation of the input is obtained, based on the first output distribution, from the second portion of the language model. The language model is adjusted, using a second training model, based on the first representation of the input. Using the adjusted language model, an output is provided based on a received user input.
Abstract:
Systems and methods for updating a language model are provided. One example method includes, at an electronic device with one or more processors and memory, training a first language model using a training data set comprising user-generated and user-relevant data, and storing a reference version of the first language model including a first overall probability distribution. Based on the reference version of the first language model, a second language model including a second overall probability distribution is updated (i.e., adapted) using the first overall probability distribution as a constraint on the second overall probability distribution.
Abstract:
Systems and methods for analysis and validation of language models trained using data that is unavailable or inaccessible are provided. One example method includes, at an electronic device with one or more processors and memory, obtaining a first set of data corresponding to one or more tokens predicted based on one or more previous tokens. The method determines a probability that the first set of data corresponds to a prediction generated by a first language model trained using a user privacy preserving training process. In accordance with a determination that the probability is within a predetermined range, the method determines that the one or more tokens correspond to a prediction associated with the user privacy preserving training process and outputs a predicted token sequence including the one or more tokens and the one or more previous tokens.
Abstract:
Systems and processes for operating an intelligent automated assistant to provide a set of predicted responses are provided. An example method includes, at an electronic device having one or more processors, receiving one or more messages and analyzing the unstructured natural language information of the one or more messages. The method also includes determining, based on the analysis of the unstructured natural language information, whether one or more predicted responses are to be provided. The method further includes, in accordance with a determination that one or more predicted responses are to be provided, determining, from a plurality of sets of candidate predicted responses, one set of predicted responses to be provided to the user based on context information. The method further includes providing the determined set of one or more predicted responses to the user.
Abstract:
Systems and processes for multilingual word prediction are provided. In accordance with one example, a method includes, at an electronic device having one or more processors and memory, identifying context information of the electronic device and generating, with the one or more processors, a plurality of candidate words based on the context information, wherein a first candidate word of the plurality of candidate words corresponds to a first language of a plurality of languages and a second candidate word of the plurality of candidate words corresponds to a second language of the plurality of languages different than the first language.
Abstract:
Systems and processes for word encoding are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, and determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models. The method further includes determining whether the first similarity is greater than the second similarity. In accordance with a determination that the first similarity is greater than the second similarity, the first acoustic model may be selected; and in accordance with a determination that the first similarity is not greater than the second similarity, the second acoustic model may be selected.
Abstract:
Systems and processes for unified language modeling are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a character of a sequence of characters and determining a current character context based on the received character of the sequence of characters and a previous character context. The method further includes determining a current word representation based on the current character context and determining a current word context based on the current word representation and a previous word context. The method further includes determining a next word representation based on the current word context and providing the next word representation.