-
公开(公告)号:US20200312315A1
公开(公告)日:2020-10-01
申请号:US16368403
申请日:2019-03-28
Applicant: Apple Inc.
Inventor: Feipeng Li , Mehrez Souden , Joshua D. Atkins , John Bridle , Charles P. Clark , Stephen H. Shum , Sachin S. Kajarekar , Haiying Xia , Erik Marchi
IPC: G10L15/20
Abstract: An acoustic environment aware method for selecting a high quality audio stream during multi-stream speech recognition. A number of input audio streams are processed to determine if a voice trigger is detected, and if so a voice trigger score is calculated for each stream. An acoustic environment measurement is also calculated for each audio stream. The trigger score and acoustic environment measurement are combined for each audio stream, to select as a preferred audio stream the audio stream with the highest combined score. The preferred audio stream is output to an automatic speech recognizer. Other aspects are also described and claimed.
-
公开(公告)号:US10764684B1
公开(公告)日:2020-09-01
申请号:US16147140
申请日:2018-09-28
Applicant: Apple Inc.
Inventor: Jonathan D. Sheaffer , Ashrith Deshpande , Joshua D. Atkins
Abstract: Systems, methods, and computer readable media to improve the operation of an electronic device having multiple microphones organized in an arbitrary, but known, arrangement in the device are described (i.e., having a specific form-factor). In general, techniques are disclosed for using a priori knowledge of an electronic device's spatial acoustic transfer functions to recreate or reconstitute a prior recorded three-dimensional (3D) audio field or environment. More particularly, techniques disclosed herein enable the efficient recording of a 3D audio field. That audio field may later be reconstituted using an acoustic characterization based on the device's form-factor. In addition, sensor data may be used to rotate the audio field so as to enable generating an output audio field that takes into account the listener's head position.
-
公开(公告)号:US10652687B2
公开(公告)日:2020-05-12
申请号:US16126974
申请日:2018-09-10
Applicant: Apple Inc.
Inventor: Darius A. Satongar , Joshua D. Atkins , Justin D. Crosby , Lance F. Reichert , Martin E. Johnson , Sawyer Cohen
Abstract: A presence of a person within a camera field of view of an electronic device is determined by digitally processing images captured by a camera. A position of a body member of the person with respect to the electronic device is also computed by digitally processing the camera captured images. A crosstalk cancellation (XTC) signal is adjusted based on the computed position of the body member. Adjusting the XTC signal includes adjusting a first predetermined model location, which includes a location at which a user should be in order to achieve a desired virtual acoustics effect. Processing program audio based on the adjusted XTC signal, to generate audio signals that drive speakers. Other aspects are also described and claimed.
-
公开(公告)号:US20200084560A1
公开(公告)日:2020-03-12
申请号:US16126974
申请日:2018-09-10
Applicant: Apple Inc.
Inventor: Darius A. Satongar , Joshua D. Atkins , Justin D. Crosby , Lance F. Reichert , Martin E. Johnson , Sawyer Cohen
Abstract: A presence of a person within a camera field of view of an electronic device is determined by digitally processing images captured by a camera. A position of a body member of the person with respect to the electronic device is also computed by digitally processing the camera captured images. A crosstalk cancellation (XTC) signal is adjusted based on the computed position of the body member. Adjusting the XTC signal includes adjusting a first predetermined model location, which includes a location at which a user should be in order to achieve a desired virtual acoustics effect. Processing program audio based on the adjusted XTC signal, to generate audio signals that drive speakers. Other aspects are also described and claimed.
-
公开(公告)号:US10178490B1
公开(公告)日:2019-01-08
申请号:US15639191
申请日:2017-06-30
Applicant: Apple Inc.
Inventor: Jonathan D. Sheaffer , Joshua D. Atkins , Martin E. Johnson , Stuart J. Wood
IPC: H04R5/00 , H04S7/00 , G10L19/008 , G06T7/20
Abstract: Image analysis of a video signal is performed to produce first metadata, and audio analysis of a multi-channel sound track associated with the video signal is performed to produce second metadata. A number of time segments of the sound track are processed, wherein each time segment is processed by either (i) spatial filtering of the audio signals or (ii) spatial rendering of the audio signals, not both, wherein for each time segment a decision was made to select between the spatial filtering or the spatial rendering, in accordance with the first and second metadata. A mix of the processed sound track and the video signal is generated. Other embodiments are also described and claimed.
-
56.
公开(公告)号:US20180350379A1
公开(公告)日:2018-12-06
申请号:US15613127
申请日:2017-06-02
Applicant: Apple Inc.
Inventor: Jason Wung , Joshua D. Atkins , Ramin Pishehvar , Mehrez Souden
IPC: G10L21/02 , G10L21/0232 , G10L21/0272 , G10L21/038
CPC classification number: G10L21/0205 , G10L21/0208 , G10L21/0232 , G10L21/0272 , G10L21/038 , G10L2021/02082 , G10L2021/02166 , H04M9/082
Abstract: A digital speech enhancement system that performs a specific chain of digital signal processing operations upon multi-channel sound pick up, to result in a single, enhanced speech signal. The operations are designed to be computationally less complex yet as a whole yield an enhanced speech signal that produces accurate voice trigger detection and low word error rates by an automatic speech recognizer. The constituent operations or components of the system have been chosen so that the overall system is robust to changing acoustic conditions, and can deliver the enhanced speech signal with low enough latency so that the system can be used online (enabling real-time, voice trigger detection and streaming ASR.) Other embodiments are also described and claimed.
-
57.
公开(公告)号:US20160360314A1
公开(公告)日:2016-12-08
申请号:US14732770
申请日:2015-06-07
Applicant: Apple Inc.
Inventor: Vasu Iyengar , Joshua D. Atkins , Aram M. Lindahl , Tarun Pruthi , Ashrith Deshpande
IPC: H04R1/40 , G10L21/02 , G10L21/0216 , H04M1/03
CPC classification number: H04R1/406 , G10L21/0208 , G10L2021/02166 , H04R3/005 , H04R2430/20 , H04R2499/11
Abstract: An orientation detector can have a first microphone, a second microphone, and a reference microphone spaced from the first microphone and the second microphone. An orientation processor can be configured to determine an orientation of the first microphone, the second microphone, or both, relative to a user's mouth based on a comparison of a relative strength of a first signal associated with the first microphone to a relative strength of a second signal associated with the second microphone. A channel selector in a speech enhancer can select one signal from among several signals based at least in part on the orientation determined by the orientation processor. A mobile communication handset can include a microphone-based orientation detector of the type disclosed herein.
Abstract translation: 取向检测器可以具有与第一麦克风和第二麦克风间隔开的第一麦克风,第二麦克风和参考麦克风。 取向处理器可以被配置为基于与第一麦克风相关联的第一信号的相对强度与第一麦克风的相对强度的比较来相对于用户的嘴来确定第一麦克风,第二麦克风或两者的取向 第二信号与第二麦克风相关联。 语音增强器中的频道选择器可以至少部分地基于由取向处理器确定的取向,从多个信号中选择一个信号。 移动通信手机可以包括本文公开的类型的基于麦克风的定向检测器。
-
-
-
-
-
-