-
公开(公告)号:US10546593B2
公开(公告)日:2020-01-28
申请号:US15830955
申请日:2017-12-04
Applicant: Apple Inc.
Inventor: Jason Wung , Mehrez Souden , Ramin Pishehvar , Joshua D. Atkins
IPC: G10L21/00 , G10L19/00 , G10L21/02 , G10L15/02 , G10L21/0232 , G10L25/30 , H04R1/40 , G10L25/03 , G10L21/0208
Abstract: A number of features are extracted from a current frame of a multi-channel speech pickup and from side information that is a linear echo estimate, a diffuse signal component, or a noise estimate of the multi-channel speech pickup. A DNN-based speech presence probability is produced for the current frame, where the SPP value is produced in response to the extracted features being input to the DNN. The DNN-based SPP value is applied to configure a multi-channel filter whose input is the multi-channel speech pickup and whose output is a single audio signal. In one aspect, the system is designed to run online, at low enough latency for real time applications such voice trigger detection. Other aspects are also described and claimed.
-
12.
公开(公告)号:US10403299B2
公开(公告)日:2019-09-03
申请号:US15613127
申请日:2017-06-02
Applicant: Apple Inc.
Inventor: Jason Wung , Joshua D. Atkins , Ramin Pishehvar , Mehrez Souden
IPC: H04M9/08 , G10L21/02 , G10L21/0208 , G10L21/0216 , G10L21/0232 , G10L21/0272 , G10L21/038
Abstract: A digital speech enhancement system that performs a specific chain of digital signal processing operations upon multi-channel sound pick up, to result in a single, enhanced speech signal. The operations are designed to be computationally less complex yet as a whole yield an enhanced speech signal that produces accurate voice trigger detection and low word error rates by an automatic speech recognizer. The constituent operations or components of the system have been chosen so that the overall system is robust to changing acoustic conditions, and can deliver the enhanced speech signal with low enough latency so that the system can be used online (enabling real-time, voice trigger detection and streaming ASR.) Other embodiments are also described and claimed.
-
13.
公开(公告)号:US20180040333A1
公开(公告)日:2018-02-08
申请号:US15227885
申请日:2016-08-03
Applicant: Apple Inc.
Inventor: Jason Wung , Ramin Pishehvar , Daniele Giacobello , Joshua D. Atkins
IPC: G10L21/0232 , G10L25/87 , G10L25/30
CPC classification number: G10L21/0232 , G10L25/30 , G10L25/87 , G10L2021/02082
Abstract: Method for performing speech enhancement using a Deep Neural Network (DNN)-based signal starts with training DNN offline by exciting a microphone using target training signal that includes signal approximation of clean speech. Loudspeaker is driven with a reference signal and outputs loudspeaker signal. Microphone then generates microphone signal based on at least one of: near-end speaker signal, ambient noise signal, or loudspeaker signal. Acoustic-echo-canceller (AEC) generates AEC echo-cancelled signal based on reference signal and microphone signal. Loudspeaker signal estimator generates estimated loudspeaker signal based on microphone signal and AEC echo-cancelled signal. DNN receives microphone signal, reference signal, AEC echo-cancelled signal, and estimated loudspeaker signal and generates a speech reference signal that includes signal statistics for residual echo or for noise. Noise suppressor generates a clean speech signal by suppressing noise or residual echo in the microphone signal based on speech reference signal. Other embodiments are described.
-
-