-
公开(公告)号:US11978476B2
公开(公告)日:2024-05-07
申请号:US17478916
申请日:2021-09-19
摘要: A system and method for detecting anomalous sound are disclosed. The method includes receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram. Each of the values corresponds to an element of the spectrogram that is identified by a coordinate in the time-frequency domain. The time-frequency domain of the spectrogram is partitioned into a context region and a target region. The context region and the target region are processed by a neural network using an attentive neural process to recover values of the spectrogram for elements with coordinates in the target region. The recovered values of the elements of the target region are compared with values of elements of the partitioned target region. An anomaly score is determined based on the comparison. The anomaly score is used for performing a control action.
-
公开(公告)号:US11947864B2
公开(公告)日:2024-04-02
申请号:US17174052
申请日:2021-02-11
申请人: AiMi Inc.
IPC分类号: G06F3/16 , G05B13/02 , G05B15/02 , G06F21/10 , G06F21/16 , G10H1/00 , G10H1/06 , G10L21/12 , G10L25/06 , H04L9/00 , H04L9/06
CPC分类号: G06F3/165 , G05B13/027 , G05B15/02 , G06F21/105 , G06F21/16 , G10H1/0025 , G10H1/0066 , G10H1/06 , G10L21/12 , G10L25/06 , H04L9/0637 , H04L9/0643 , G10H2210/076 , G10H2220/126 , H04L9/50
摘要: Techniques are disclosed relating to automatically generate new music content based on image representations of audio files. A computer system generate image representations of audio files. The image representations may be generated, for example, based on data in the audio files and MIDI representations of the audio files. Audio files for combination may then be selected based on analysis of the image representations. For example, image-based machine learning algorithms may be implemented to assess the image representations and select music for combining.
-
公开(公告)号:US20230259327A1
公开(公告)日:2023-08-17
申请号:US18306169
申请日:2023-04-24
申请人: AiMi Inc.
IPC分类号: G06F3/16 , G05B15/02 , G10L21/12 , G06F21/10 , G06F21/16 , H04L9/06 , G05B13/02 , G10L25/06 , G10H1/00 , G10H1/06
CPC分类号: G06F3/165 , G05B15/02 , G10L21/12 , G06F21/105 , G06F21/16 , H04L9/0637 , H04L9/0643 , G05B13/027 , G10L25/06 , G10H1/0025 , G10H1/0066 , G10H1/06 , G10H2210/076 , G10H2220/126 , H04L9/50
摘要: Techniques are disclosed relating to implementing audio techniques for real-time audio generation. For example, a music generator system may generate new music content from playback music content based on different parameter representations of an audio signal. In some cases, an audio signal can be represented by both a graph of the signal (e.g., an audio signal graph) relative to time and a graph of the signal relative to beats (e.g., a signal graph). The signal graph is invariant to tempo, which allows for tempo invariant modification of audio parameters of the music content in addition to tempo variant modifications based on the audio signal graph.
-
公开(公告)号:US11670322B2
公开(公告)日:2023-06-06
申请号:US16942410
申请日:2020-07-29
IPC分类号: G10L25/54 , G06F16/65 , G06F3/16 , G06N3/08 , G10L21/12 , G10L21/14 , G10L25/30 , G06F18/214
CPC分类号: G10L25/54 , G06F3/165 , G06F16/65 , G06F18/214 , G06N3/08 , G10L21/12 , G10L21/14 , G10L25/30
摘要: A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
-
公开(公告)号:US20220157332A1
公开(公告)日:2022-05-19
申请号:US17455110
申请日:2021-11-16
申请人: EMOCOG Co., Ltd.
发明人: Yoo Hun NOH , Eui Chul LEE , Na Hye KIM , So Eui KIM , Ji Won MOK , Su Gyeong YU , Na Yeon HAN
摘要: This application relates to a device and a method for voice-based trauma screening using deep learning. The device and method for voice-based trauma screening using deep learning screen for trauma through voices that may be obtained in a non-contact manner without limitations of space or situation. In one aspect, the device includes a memory configured to store at least one program and a processor configured to perform an operation by executing the at least one program. The processor can obtain voice data, pre-process the voice data, convert pre-processed voice data into image data, and input the image data to a deep learning model and obtain a trauma result value as an output value of the deep learning model.
-
公开(公告)号:US20210249032A1
公开(公告)日:2021-08-12
申请号:US17050938
申请日:2019-04-26
摘要: A method for capturing, recording, playing back, visually representing, storing and processing of audio signals, comprises converting the audio signal into a video that pairs the audio with a visual representation of the audio data where such visual representation may contain the waveform, relevant text, spectrogram, wavelet decomposition, or other transformation of the audio data in such a way that the viewer can identify which part of the visual representation is associated with the currently playing audio signal.
-
公开(公告)号:US20190287550A1
公开(公告)日:2019-09-19
申请号:US16196356
申请日:2018-11-20
发明人: Woo-taek LIM
摘要: Disclosed is a sound event detecting method including receiving an audio signal, transforming the audio signal into a two-dimensional (2D) signal, extracting a feature map by training a convolutional neural network (CNN) using the 2D signal, pooling the feature map based on a frequency, and determining whether a sound event occurs with respect to each of at least one time interval based on a result of the pooling.
-
公开(公告)号:US10276164B2
公开(公告)日:2019-04-30
申请号:US15823937
申请日:2017-11-28
申请人: SORIZAVA CO., LTD.
发明人: Munhak An
IPC分类号: G10L17/00 , G10L15/26 , G06F17/28 , G10L21/12 , G10L15/32 , G10L21/0272 , G10L15/08 , G06F17/30 , G06F17/27
摘要: The present invention relates to a multi-speaker speech recognition correction system for determining a speaker of an utterance with a simple method and easily correcting speech-recognized text during speech recognition for a plurality of speakers. According to the present invention, when speech signals are input to a multi-speaker speech recognition system from a plurality of microphones which are each provided to a corresponding one of a plurality of speakers, the multi-speaker speech recognition correction system may detect a speech session from a time point at which input of each of the speech signals is started to a time point at which the input of the speech signal is stopped, and a speech recognizer may convert only the detected speech sessions into text so that a speaker of an utterance can be identified by a simple method and speech recognition can be carried out at a low cost.
-
公开(公告)号:US20180307461A1
公开(公告)日:2018-10-25
申请号:US16022328
申请日:2018-06-28
申请人: Patient Prism LLC
IPC分类号: G06F3/16 , G06Q30/02 , G06F17/22 , H04M3/51 , G10L21/12 , H04L29/08 , G06F17/24 , G10L15/08 , G06F3/0481 , G06F3/0482 , H04M3/42
CPC分类号: G06F3/165 , G06F3/04812 , G06F3/0482 , G06F17/2235 , G06F17/241 , G06Q30/0201 , G10L15/08 , G10L21/12 , G10L2015/088 , H04L67/02 , H04L67/146 , H04L67/22 , H04M3/42221 , H04M3/5175 , H04M2203/303 , H04M2203/305 , H04M2203/403
摘要: Merchant/consumer calls may be recorded and evaluated according to a variety of criteria. The call recordings and analyses thereof, as well as consumer tracking information, may be displayed in a user interface of a web-based online portal for convenience in evaluating the use and efficacy of marketing channels as well as the quality of merchant/consumer interactions. In an aspect, the user interface provides a representation of a variety of telephone calls as an interactive keyword cloud that presents business-value-specific keywords targeted for detection during such telephone calls. The keyword cloud may depict keywords in a range of colors, sizes, and relative positioning to connote varied degrees of significance, such as a relative rate of occurrence of keywords in the represented telephone calls. Each keyword in the keyword cloud may contain a hyperlink to related content such as a listing of telephone calls containing the keyword.
-
公开(公告)号:US20160260445A1
公开(公告)日:2016-09-08
申请号:US14639919
申请日:2015-03-05
发明人: Sven Duwenhorst
IPC分类号: G10L25/84 , G06F3/0484 , H03G3/32 , G10L21/12 , G06F3/16 , G10L21/0364
CPC分类号: G06F3/04847 , G06F3/165 , G10L21/0224 , G10L21/0316 , G10L21/12 , H03G3/32 , H03G7/002 , H03G7/007
摘要: Audio loudness adjustment techniques are described. In one or more implementations, primary and secondary sound data originating as part of an audio signal is adjusted. For example, a loudness of the sound data is adjusted. To do so, the loudness, which indicates a sound intensity of the primary and secondary sound data, is determined. Adjustments are then computed for at least a portion of the audio signal based on a target dynamic range parameter, which defines a desired difference between the loudness of the primary and secondary sound data respectively. Based on the computed adjustments, a variety of actions may be performed, such as applying the adjustments to the audio signal to generate an adjusted audio signal in which the primary and secondary sound data substantially have the desired loudness difference. Further, a preview of the adjusted audio signal may be updated in real-time for display in a user interface.
摘要翻译: 描述音频响度调整技术。 在一个或多个实现中,调整作为音频信号的一部分发起的主要和次要声音数据。 例如,调整声音数据的响度。 为此,确定表示主要和次要声音数据的声音强度的响度。 然后,基于目标动态范围参数对音频信号的至少一部分进行调整,该目标动态范围参数分别定义了主要和次要声音数据的响度之间的期望差异。 基于所计算的调整,可以执行各种动作,例如对音频信号应用调整以产生经调整的音频信号,其中主声音数据和辅助声音数据基本上具有期望的响度差。 此外,调整后的音频信号的预览可以被实时地更新以在用户界面中显示。
-
-
-
-
-
-
-
-
-