摘要:
Systems and methods are provided for associating a phonetic pronunciation with a name by receiving the name, mapping the name to a plurality of monosyllabic components that are combinable to construct the phonetic pronunciation of the name, receiving a user input to select one or more of the plurality, and combining the selected one or more of the plurality of monosyllabic components to construct the phonetic pronunciation of the name.
摘要:
Audio signals produced by microphones can be processed to remove echo and reverberation. The processed signals can be mapped to each other with adaptively estimated impulse responses. One or more of the processed signals, one or more of the mapped signals, and one or more of the impulse responses can be fed to an automatic speech recognizer (ASR) having a deep neural network (DNN), to train the DNN or recognize speech in the input audio signals. Other aspects are described and claimed.
摘要:
Systems and processes are disclosed for virtual assistant request recognition using live usage data and data relating to future events. User requests that are received but not recognized can be used to generate candidate request templates. A count can be associated with each candidate request template and can be incremented each time a matching candidate request template is received. When a count reaches a threshold level, the corresponding candidate request template can be used to train a virtual assistant to recognize and respond to similar user requests in the future. In addition, data relating to future events can be mined to extract relevant information that can be used to populate both recognized user request templates and candidate user request templates. Populated user request templates (e.g., whole expected utterances) can then be used to recognize user requests and disambiguate user intent as future events become relevant.
摘要:
A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.
摘要:
A method of performing automatic speech recognition (ASR) using end-pointing markers generated using accelerometer-based voice activity detector starts with a voice activity detector (VAD) generating an accelerometer VAD output (VADa) based on data output by at least one accelerometer that is included in at least one earbud. The at least one accelerometer to detect vibration of the user's vocal chords. A voice processor detects a speech signal based on acoustic signals from at least one microphone. An end-pointer generates the end-pointing markers based on the VADa output and an ASR engine performs ASR on the speech signal based on the end-pointing markers. Other embodiments are also described.
摘要:
A method for updating an adaptive speech recognition model is provided. In some implementations, the method is performed at a communications device including one or more processors and memory storing instructions for execution by the one or more processors. The method includes determining that a first user of a first mobile communication device is engaged in a call over a communications network and providing an adaptive speech recognition model The method also includes analyzing an outbound audio channel of the first mobile communication device to obtain a call audio signal corresponding to audio input from one or more microphones of the first mobile communication device and updating the adaptive speech recognition model with training data derived from the call audio signal.
摘要:
Techniques for providing reminders based on social interactions between users of electronic devices are described. Social reminders can be set to trigger based on social interactions of users. For example, a user may request to be reminded to discuss a certain discussion topic with a particular phonebook contact, when the user next encounters the contact.
摘要:
The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received. A first phonetic representation of the at least one word is determined, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet. The first set of phonemes is mapped to a second set of phonemes to generate a second phonetic representation, where the second set of phonemes is selected from a speech synthesis phonetic alphabet. The second phonetic representation is stored in association with a text string corresponding to the at least one word.
摘要:
Systems and processes for generating a shared pronunciation lexicon and using the shared pronunciation lexicon to interpret spoken user inputs received by a virtual assistant are provided. In one example, the process can include receiving pronunciations for words or named entities from multiple users. The pronunciations can be tagged with context tags and stored in the shared pronunciation lexicon. The shared pronunciation lexicon can then be used to interpret a spoken user input received by a user device by determining a relevant subset of the shared pronunciation lexicon based on contextual information associated with the user device and performing speech-to-text conversion on the spoken user input using the determined subset of the shared pronunciation lexicon.
摘要:
While an electronic device with a display and a touch-sensitive surface is in a screen reader accessibility mode, the device displays an application launcher screen including a plurality of application icons. A respective application icon corresponds to a respective application stored in the device. The device detects a sequence of one or more gestures on the touch-sensitive surface that correspond to one or more characters. A respective gesture that corresponds to a respective character is a single finger gesture that moves across the touch-sensitive surface along a respective path that corresponds to the respective character. The device determines whether the detected sequence of one or more gestures corresponds to a respective application icon of the plurality of application icons, and, in response to determining that the detected sequence of one or more gestures corresponds to the respective application icon, performs a predefined operation associated with the respective application icon.