-
公开(公告)号:US12190873B2
公开(公告)日:2025-01-07
申请号:US17952005
申请日:2022-09-23
Applicant: Apple Inc.
Inventor: Ahmed S. Hussen Abdelaziz , Saurabh Adya , Alexander W. Churchill , Pranay Dighe , Sachin S. Kajarekar , Chaitanya Mannemala , Erik Marchi , Seyedmahdad Mirsamadi , Ognjen Rudovic , Ahmed H. Tewfik , Barry-John Theobald , Srikanth Vishnubhotla
Abstract: An example process includes: receiving a speech input representing a user utterance; determining, based on a textual representation of the speech input, a first score corresponding to a type of the user utterance; determining, based on the textual representation of the speech input, a second score representing a correspondence between the user utterance and a domain recognized by a digital assistant; determining, based on the first score and the second score, whether the speech input is intended for the digital assistant; in accordance with a determination that the speech input is intended for the digital assistant: initiating, by the digital assistant, a task based on the speech input; and providing an output indicative of the initiated task.
-
公开(公告)号:US10789959B2
公开(公告)日:2020-09-29
申请号:US15997174
申请日:2018-06-04
Applicant: Apple Inc.
Inventor: Sachin S. Kajarekar
Abstract: Techniques for training a speaker recognition model used for interacting with a digital assistant are provided. In some examples, user authentication information is obtained at a first time. At a second time, a user utterance representing a user request is received. A voice print is generated from the user utterance. A determination is made as to whether a plurality of conditions are satisfied. The plurality of conditions includes a first condition that the user authentication information corresponds to one or more authentication credentials assigned to a registered user of an electronic device. The plurality of conditions further includes a second condition that the first time and the second time are not separated by more than a predefined time period. In accordance with a determination that the plurality of conditions are satisfied, a speaker profile assigned to the registered user is updated based on the voice print.
-
公开(公告)号:US20170365249A1
公开(公告)日:2017-12-21
申请号:US15188861
申请日:2016-06-21
Applicant: Apple Inc.
Inventor: Sorin V. Dusan , Devang K. Naik , Sachin S. Kajarekar
IPC: G10L15/05 , G10L21/0208 , G10L15/30 , G10L25/21 , H04R1/10
CPC classification number: G10L15/05 , G10L15/30 , G10L25/21 , G10L25/78 , H04R1/1016 , H04R3/005 , H04R2201/403 , H04R2410/01 , H04R2420/07 , H04R2430/20
Abstract: A method of performing automatic speech recognition (ASR) using end-pointing markers generated using accelerometer-based voice activity detector starts with a voice activity detector (VAD) generating an accelerometer VAD output (VADa) based on data output by at least one accelerometer that is included in at least one earbud. The at least one accelerometer to detect vibration of the user's vocal chords. A voice processor detects a speech signal based on acoustic signals from at least one microphone. An end-pointer generates the end-pointing markers based on the VADa output and an ASR engine performs ASR on the speech signal based on the end-pointing markers. Other embodiments are also described.
-
公开(公告)号:US10127911B2
公开(公告)日:2018-11-13
申请号:US14835169
申请日:2015-08-25
Applicant: Apple Inc.
Inventor: Yoon Kim , Sachin S. Kajarekar
Abstract: Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
-
公开(公告)号:US11423898B2
公开(公告)日:2022-08-23
申请号:US16815984
申请日:2020-03-11
Applicant: Apple Inc.
Inventor: Stephen H. Shum , Corey J. Peterson , Sachin S. Kajarekar , Benjamin S. Phipps , Erik Marchi , Jessica Peck , Anumita Biswas , Chaitanya Mannemala
Abstract: Systems and processes for operating an intelligent automated assistant are provided. An example method includes receiving, from one or more external electronic devices, a plurality of speaker profiles for a plurality of users; receiving a natural language speech input; determining, based on comparing the natural language speech input to the plurality of speaker profiles: a first likelihood that the natural language speech input corresponds to a first user of the plurality of users; and a second likelihood that the natural language speech input corresponds to a second user of the plurality of users; determining whether the first likelihood and the second likelihood are within a first threshold; and in accordance with determining that the first likelihood and the second likelihood are not within the first threshold: providing a response to the natural language speech input, the response being personalized for the first user.
-
公开(公告)号:US20200312315A1
公开(公告)日:2020-10-01
申请号:US16368403
申请日:2019-03-28
Applicant: Apple Inc.
Inventor: Feipeng Li , Mehrez Souden , Joshua D. Atkins , John Bridle , Charles P. Clark , Stephen H. Shum , Sachin S. Kajarekar , Haiying Xia , Erik Marchi
IPC: G10L15/20
Abstract: An acoustic environment aware method for selecting a high quality audio stream during multi-stream speech recognition. A number of input audio streams are processed to determine if a voice trigger is detected, and if so a voice trigger score is calculated for each stream. An acoustic environment measurement is also calculated for each audio stream. The trigger score and acoustic environment measurement are combined for each audio stream, to select as a preferred audio stream the audio stream with the highest combined score. The preferred audio stream is output to an automatic speech recognizer. Other aspects are also described and claimed.
-
公开(公告)号:US10438595B2
公开(公告)日:2019-10-08
申请号:US16155662
申请日:2018-10-09
Applicant: Apple Inc.
Inventor: Yoon Kim , Sachin S. Kajarekar
Abstract: Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
-
公开(公告)号:US10255907B2
公开(公告)日:2019-04-09
申请号:US14846650
申请日:2015-09-04
Applicant: Apple Inc.
Inventor: Udhyakumar Nallasamy , Sachin S. Kajarekar , Matthias Paulik , Matthew Seigel
Abstract: Systems and processes for automatic accent detection are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, and determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models. The method further includes determining whether the first similarity is greater than the second similarity. In accordance with a determination that the first similarity is greater than the second similarity, the first acoustic model may be selected; and in accordance with a determination that the first similarity is not greater than the second similarity, the second acoustic model may be selected.
-
公开(公告)号:US10186254B2
公开(公告)日:2019-01-22
申请号:US14846667
申请日:2015-09-04
Applicant: Apple Inc.
Inventor: Shaun E. Williams , Henry G. Mason , Mahesh Krishnamoorthy , Matthias Paulik , Neha Agrawal , Sachin S. Kajarekar , Selen Uguroglu , Ali S. Mohamed
Abstract: The present disclosure generally relates to context-based endpoint detection in user speech input. A method for identifying an endpoint of a spoken request by a user may include receiving user input of natural language speech including one or more words; identifying at least one context associated with the user input; generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint; determining whether the probability is greater than a threshold; and in accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint.
-
-
-
-
-
-
-
-