摘要:
A speech intelligibility enhancement (SIE) system and method is described that improves the intelligibility of a speech signal to be played back by an audio device when the audio device is located in an environment with loud acoustic background noise. In an embodiment, the audio device comprises a near-end telephony terminal and the speech signal comprises a speech signal received over a communication network from a far-end telephony terminal for playback at the near-end telephony terminal.
摘要:
Unlike sound based pressure waves that go everywhere, air turbulence caused by wind is usually a fairly local event. Therefore, in a system that utilizes two or more spatially separated microphones to pick up sound signals (e.g., speech), wind noise picked up by one of the microphones often will not be picked up (or at least not to the same extent) by the other microphone(s). Embodiments of methods and apparatuses that utilize this tact and others to effectively detect and suppress wind noise using multiple microphones that are spatially separated are described.
摘要:
Systems and methods are described for enhancing the audio quality of an FM receiver. In embodiments described herein, quadrature L−R demodulation is applied to a composite baseband signal output by an FM demodulator to obtain an L−R noise signal. A channel quality measure is calculated based on the L−R noise signal and is used to control whether a pop suppression technique is applied to an L+R signal obtained from the composite baseband signal to detect and remove noise pulses therefrom. The channel quality measure and the L−R noise signal are also leveraged to perform single-channel noise suppression in the frequency domain on an L−R signal obtained from the composite baseband signal and on the L+R signal. The channel quality measure is also used to control the application of a fast fading compensation process that replaces noisy segments of the L−R and L+R signal with replacement waveforms generated via waveform extrapolation.
摘要:
Systems and methods are described that utilize dynamic time scale modification (TSM) to achieve reduced bit rate audio coding. In accordance with embodiments, different levels of TSM compression are selectively applied to segments of an input speech signal prior to encoding thereof by an encoder. Encoded TSM-compressed segments are received at a decoder which decodes such segments and then applies an appropriate level of TSM decompression to each based on information received from the encoder. By selectively applying different levels of TSM compression to segments of an input speech signal prior to encoding, a coding bit rate associated with the encoder/decoder is reduced. Furthermore, by selecting a level of TSM compression for each segment of the input speech signal that takes into account certain local characteristics of that signal, such bit rate reduction is provided without introducing unacceptable levels of distortion into an output speech signal produced by the decoder.
摘要:
Unlike sound based pressure waves that go everywhere, air turbulence caused by wind is usually a fairly local event. Therefore, in a system that utilizes two or more spatially separated microphones to pick up sound signals (e.g., speech), wind noise picked up by one of the microphones often will not be picked up (or at least not to the same extent) by the other microphone(s). Embodiments of methods and apparatuses that utilize this tact and others to effectively detect and suppress wind noise using multiple microphones that are spatially separated are described.
摘要:
A method of processing a decoded speech (DS) signal including successive DS frames, each DS frame including DS samples. The method comprises: adaptively filtering the DS signal to produce a filtered signal; gain-scaling the filtered signal with an adaptive gain updated once a DS frame, thereby producing a gain-scaled signal; and performing a smoothing operation to smooth possible waveform discontinuities in the gain-scaled signal.
摘要:
A technique for performing frame erasure concealment (FEC) in a speech decoder. One or more non-erased frames of a speech signal are decoded in a block-independent manner. When an erased frame is detected, a short-term predictive filter and a long-term predictive filter are derived based on previously-decoded portions of the speech signal. A periodic waveform component is generated using the short-term predictive filter and the long-term predictive filter. A random waveform component is generated using the short-term predictive filter. A replacement frame is generated for the erased frame. The replacement frame may be generated based on the periodic waveform component, the random waveform component, or a mixture of both.
摘要:
Systems and methods are described that utilize dynamic time scale modification (TSM) to achieve reduced bit rate audio coding. In accordance with embodiments, different levels of TSM compression are selectively applied to segments of an input speech signal prior to encoding thereof by an encoder. Encoded TSM-compressed segments are received at a decoder which decodes such segments and then applies an appropriate level of TSM decompression to each based on information received from the encoder. By selectively applying different levels of TSM compression to segments of an input speech signal prior to encoding, a coding bit rate associated with the encoder/decoder is reduced. Furthermore, by selecting a level of TSM compression for each segment of the input speech signal that takes into account certain local characteristics of that signal, such bit rate reduction is provided without introducing unacceptable levels of distortion into an output speech signal produced by the decoder.
摘要:
A method of determining a pitch period of an audio signal using a correlation-based signal derived from the audio signal. The correlation-based signal includes known peaks each corresponding to a respective one of known time lags. The known peaks includes a global maximum peak. The method comprises: (a) determining if a candidate peak among the local peaks exceeds a peak threshold; (b) determining if a candidate time lag corresponding to the candidate peak is within a predetermined range of at least one integer sub-multiple of the time lag corresponding to the global maximum peak; and (c) setting the pitch period equal to the candidate time lag when the determinations of both steps (a) and (b) are true.
摘要:
A method and system are provided for synthesizing a corrupted frame output from a decoder including one or more predictive filters. The corrupted frame is representative of one segment of a decoded signal output from the decoder. The method comprises extrapolating a replacement frame based upon another segment of the decoded signal and substituting the replacement frame for the corrupted frame. Finally, the internal states of the filters are updated based upon the substituting.