Method and system for detecting anomalous sound

    公开(公告)号:US11978476B2

    公开(公告)日:2024-05-07

    申请号:US17478916

    申请日:2021-09-19

    摘要: A system and method for detecting anomalous sound are disclosed. The method includes receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram. Each of the values corresponds to an element of the spectrogram that is identified by a coordinate in the time-frequency domain. The time-frequency domain of the spectrogram is partitioned into a context region and a target region. The context region and the target region are processed by a neural network using an attentive neural process to recover values of the spectrogram for elements with coordinates in the target region. The recovered values of the elements of the target region are compared with values of elements of the partitioned target region. An anomaly score is determined based on the comparison. The anomaly score is used for performing a control action.

    Processing Audio Information
    6.
    发明申请

    公开(公告)号:US20210249032A1

    公开(公告)日:2021-08-12

    申请号:US17050938

    申请日:2019-04-26

    IPC分类号: G10L21/14 G10L21/12 G06F3/16

    摘要: A method for capturing, recording, playing back, visually representing, storing and processing of audio signals, comprises converting the audio signal into a video that pairs the audio with a visual representation of the audio data where such visual representation may contain the waveform, relevant text, spectrogram, wavelet decomposition, or other transformation of the audio data in such a way that the viewer can identify which part of the visual representation is associated with the currently playing audio signal.

    Multi-speaker speech recognition correction system

    公开(公告)号:US10276164B2

    公开(公告)日:2019-04-30

    申请号:US15823937

    申请日:2017-11-28

    发明人: Munhak An

    摘要: The present invention relates to a multi-speaker speech recognition correction system for determining a speaker of an utterance with a simple method and easily correcting speech-recognized text during speech recognition for a plurality of speakers. According to the present invention, when speech signals are input to a multi-speaker speech recognition system from a plurality of microphones which are each provided to a corresponding one of a plurality of speakers, the multi-speaker speech recognition correction system may detect a speech session from a time point at which input of each of the speech signals is started to a time point at which the input of the speech signal is stopped, and a speech recognizer may convert only the detected speech sessions into text so that a speaker of an utterance can be identified by a simple method and speech recognition can be carried out at a low cost.

    Audio Loudness Adjustment
    10.
    发明申请
    Audio Loudness Adjustment 审中-公开
    音频响度调整

    公开(公告)号:US20160260445A1

    公开(公告)日:2016-09-08

    申请号:US14639919

    申请日:2015-03-05

    发明人: Sven Duwenhorst

    摘要: Audio loudness adjustment techniques are described. In one or more implementations, primary and secondary sound data originating as part of an audio signal is adjusted. For example, a loudness of the sound data is adjusted. To do so, the loudness, which indicates a sound intensity of the primary and secondary sound data, is determined. Adjustments are then computed for at least a portion of the audio signal based on a target dynamic range parameter, which defines a desired difference between the loudness of the primary and secondary sound data respectively. Based on the computed adjustments, a variety of actions may be performed, such as applying the adjustments to the audio signal to generate an adjusted audio signal in which the primary and secondary sound data substantially have the desired loudness difference. Further, a preview of the adjusted audio signal may be updated in real-time for display in a user interface.

    摘要翻译: 描述音频响度调整技术。 在一个或多个实现中,调整作为音频信号的一部分发起的主要和次要声音数据。 例如,调整声音数据的响度。 为此,确定表示主要和次要声音数据的声音强度的响度。 然后,基于目标动态范围参数对音频信号的至少一部分进行调整,该目标动态范围参数分别定义了主要和次要声音数据的响度之间的期望差异。 基于所计算的调整,可以执行各种动作,例如对音频信号应用调整以产生经调整的音频信号,其中主声音数据和辅助声音数据基本上具有期望的响度差。 此外,调整后的音频信号的预览可以被实时地更新以在用户界面中显示。