专利检索 ipc:"G10L21/12" 第 1 页

1.

发明授权
Method and system for detecting anomalous sound 有权

公开(公告)号：US11978476B2

公开(公告)日：2024-05-07

申请号：US17478916

申请日：2021-09-19

申请人： Mitsubishi Electric Research Laboratories, Inc.

发明人： Gordon Wichern , Ankush Chakrabarty , Zhong-Qiu Wang , Jonathan Le Roux

IPC分类号： G10L25/78 , G06N3/08 , G10L21/12 , G10L21/14 , G10L25/30

CPC分类号： G10L25/78 , G06N3/08 , G10L21/12 , G10L21/14 , G10L25/30

摘要： A system and method for detecting anomalous sound are disclosed. The method includes receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram. Each of the values corresponds to an element of the spectrogram that is identified by a coordinate in the time-frequency domain. The time-frequency domain of the spectrogram is partitioned into a context region and a target region. The context region and the target region are processed by a neural network using an attentive neural process to recover values of the spectrogram for elements with coordinates in the target region. The recovered values of the elements of the target region are compared with values of elements of the partitioned target region. An anomaly score is determined based on the comparison. The anomaly score is used for performing a control action.

2.

发明授权
Music content generation using image representations of audio files 有权

公开(公告)号：US11947864B2

公开(公告)日：2024-04-02

申请号：US17174052

申请日：2021-02-11

申请人： AiMi Inc.

发明人： Edward Balassanian , Patrick E. Hutchings , Toby Gifford

IPC分类号： G06F3/16 , G05B13/02 , G05B15/02 , G06F21/10 , G06F21/16 , G10H1/00 , G10H1/06 , G10L21/12 , G10L25/06 , H04L9/00 , H04L9/06

CPC分类号： G06F3/165 , G05B13/027 , G05B15/02 , G06F21/105 , G06F21/16 , G10H1/0025 , G10H1/0066 , G10H1/06 , G10L21/12 , G10L25/06 , H04L9/0637 , H04L9/0643 , G10H2210/076 , G10H2220/126 , H04L9/50

摘要： Techniques are disclosed relating to automatically generate new music content based on image representations of audio files. A computer system generate image representations of audio files. The image representations may be generated, for example, based on data in the audio files and MIDI representations of the audio files. Audio files for combination may then be selected based on analysis of the image representations. For example, image-based machine learning algorithms may be implemented to assess the image representations and select music for combining.

3.

发明公开
Audio Techniques for Music Content Generation 审中-公开

公开(公告)号：US20230259327A1

公开(公告)日：2023-08-17

申请号：US18306169

申请日：2023-04-24

申请人： AiMi Inc.

发明人： Edward Balassanian , Patrick E. Hutchings , Toby Gifford

IPC分类号： G06F3/16 , G05B15/02 , G10L21/12 , G06F21/10 , G06F21/16 , H04L9/06 , G05B13/02 , G10L25/06 , G10H1/00 , G10H1/06

CPC分类号： G06F3/165 , G05B15/02 , G10L21/12 , G06F21/105 , G06F21/16 , H04L9/0637 , H04L9/0643 , G05B13/027 , G10L25/06 , G10H1/0025 , G10H1/0066 , G10H1/06 , G10H2210/076 , G10H2220/126 , H04L9/50

摘要： Techniques are disclosed relating to implementing audio techniques for real-time audio generation. For example, a music generator system may generate new music content from playback music content based on different parameter representations of an audio signal. In some cases, an audio signal can be represented by both a graph of the signal (e.g., an audio signal graph) relative to time and a graph of the signal relative to beats (e.g., a signal graph). The signal graph is invariant to tempo, which allows for tempo invariant modification of audio parameters of the music content in addition to tempo variant modifications based on the audio signal graph.

4.

发明授权
Method and system for learning and using latent-space representations of audio signals for audio content-based retrieval 有权

公开(公告)号：US11670322B2

公开(公告)日：2023-06-06

申请号：US16942410

申请日：2020-07-29

申请人： Distributed Creation Inc.

发明人： Alejandro Koretzky , Naveen Sasalu Rajashekharappa

IPC分类号： G10L25/54 , G06F16/65 , G06F3/16 , G06N3/08 , G10L21/12 , G10L21/14 , G10L25/30 , G06F18/214

CPC分类号： G10L25/54 , G06F3/165 , G06F16/65 , G06F18/214 , G06N3/08 , G10L21/12 , G10L21/14 , G10L25/30

摘要： A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.

5.

发明申请
DEVICE AND METHOD FOR VOICE-BASED TRAUMA SCREENING USING DEEP-LEARNING 有权

公开(公告)号：US20220157332A1

公开(公告)日：2022-05-19

申请号：US17455110

申请日：2021-11-16

申请人： EMOCOG Co., Ltd.

发明人： Yoo Hun NOH , Eui Chul LEE , Na Hye KIM , So Eui KIM , Ji Won MOK , Su Gyeong YU , Na Yeon HAN

IPC分类号： G10L25/63 , G10L25/18 , G10L21/12 , G10L15/22 , G06N20/00 , A61B5/00 , A61B5/16

摘要： This application relates to a device and a method for voice-based trauma screening using deep learning. The device and method for voice-based trauma screening using deep learning screen for trauma through voices that may be obtained in a non-contact manner without limitations of space or situation. In one aspect, the device includes a memory configured to store at least one program and a processor configured to perform an operation by executing the at least one program. The processor can obtain voice data, pre-process the voice data, convert pre-processed voice data into image data, and input the image data to a deep learning model and obtain a trauma result value as an output value of the deep learning model.

6.

发明申请
Processing Audio Information 有权

公开(公告)号：US20210249032A1

公开(公告)日：2021-08-12

申请号：US17050938

申请日：2019-04-26

申请人： Thinklabs Medical LLC

发明人： Clive Leonard Smith , Jeremy Schiff , John Andrew Kreisher

IPC分类号： G10L21/14 , G10L21/12 , G06F3/16

摘要： A method for capturing, recording, playing back, visually representing, storing and processing of audio signals, comprises converting the audio signal into a video that pairs the audio with a visual representation of the audio data where such visual representation may contain the waveform, relevant text, spectrogram, wavelet decomposition, or other transformation of the audio data in such a way that the viewer can identify which part of the visual representation is associated with the currently playing audio signal.

7.

发明申请
METHOD AND APPARATUS FOR SOUND EVENT DETECTION ROBUST TO FREQUENCY CHANGE 审中-公开

公开(公告)号：US20190287550A1

公开(公告)日：2019-09-19

申请号：US16196356

申请日：2018-11-20

申请人： Electronics and Telecommunications Research Institute

发明人： Woo-taek LIM

IPC分类号： G10L21/14 , G10L21/12 , G10L25/30

摘要： Disclosed is a sound event detecting method including receiving an audio signal, transforming the audio signal into a two-dimensional (2D) signal, extracting a feature map by training a convolutional neural network (CNN) using the 2D signal, pooling the feature map based on a frequency, and determining whether a sound event occurs with respect to each of at least one time interval based on a result of the pooling.

8.

发明授权
Multi-speaker speech recognition correction system 有权

公开(公告)号：US10276164B2

公开(公告)日：2019-04-30

申请号：US15823937

申请日：2017-11-28

申请人： SORIZAVA CO., LTD.

发明人： Munhak An

IPC分类号： G10L17/00 , G10L15/26 , G06F17/28 , G10L21/12 , G10L15/32 , G10L21/0272 , G10L15/08 , G06F17/30 , G06F17/27

摘要： The present invention relates to a multi-speaker speech recognition correction system for determining a speaker of an utterance with a simple method and easily correcting speech-recognized text during speech recognition for a plurality of speakers. According to the present invention, when speech signals are input to a multi-speaker speech recognition system from a plurality of microphones which are each provided to a corresponding one of a plurality of speakers, the multi-speaker speech recognition correction system may detect a speech session from a time point at which input of each of the speech signals is started to a time point at which the input of the speech signal is stopped, and a speech recognizer may convert only the detected speech sessions into text so that a speaker of an utterance can be identified by a simple method and speech recognition can be carried out at a low cost.

9.

发明申请
INTERACTIVE KEYWORD CLOUD 审中-公开

公开(公告)号：US20180307461A1

公开(公告)日：2018-10-25

申请号：US16022328

申请日：2018-06-28

申请人： Patient Prism LLC

发明人： Michael G. Spiessbach , Amol Nirgudkar

IPC分类号： G06F3/16 , G06Q30/02 , G06F17/22 , H04M3/51 , G10L21/12 , H04L29/08 , G06F17/24 , G10L15/08 , G06F3/0481 , G06F3/0482 , H04M3/42

CPC分类号： G06F3/165 , G06F3/04812 , G06F3/0482 , G06F17/2235 , G06F17/241 , G06Q30/0201 , G10L15/08 , G10L21/12 , G10L2015/088 , H04L67/02 , H04L67/146 , H04L67/22 , H04M3/42221 , H04M3/5175 , H04M2203/303 , H04M2203/305 , H04M2203/403

摘要： Merchant/consumer calls may be recorded and evaluated according to a variety of criteria. The call recordings and analyses thereof, as well as consumer tracking information, may be displayed in a user interface of a web-based online portal for convenience in evaluating the use and efficacy of marketing channels as well as the quality of merchant/consumer interactions. In an aspect, the user interface provides a representation of a variety of telephone calls as an interactive keyword cloud that presents business-value-specific keywords targeted for detection during such telephone calls. The keyword cloud may depict keywords in a range of colors, sizes, and relative positioning to connote varied degrees of significance, such as a relative rate of occurrence of keywords in the represented telephone calls. Each keyword in the keyword cloud may contain a hyperlink to related content such as a listing of telephone calls containing the keyword.

10.

发明申请
Audio Loudness Adjustment 审中-公开
标题翻译：音频响度调整

公开(公告)号：US20160260445A1

公开(公告)日：2016-09-08

申请号：US14639919

申请日：2015-03-05

申请人： Adobe Systems Incorporated

发明人： Sven Duwenhorst

IPC分类号： G10L25/84 , G06F3/0484 , H03G3/32 , G10L21/12 , G06F3/16 , G10L21/0364

CPC分类号： G06F3/04847 , G06F3/165 , G10L21/0224 , G10L21/0316 , G10L21/12 , H03G3/32 , H03G7/002 , H03G7/007

摘要： Audio loudness adjustment techniques are described. In one or more implementations, primary and secondary sound data originating as part of an audio signal is adjusted. For example, a loudness of the sound data is adjusted. To do so, the loudness, which indicates a sound intensity of the primary and secondary sound data, is determined. Adjustments are then computed for at least a portion of the audio signal based on a target dynamic range parameter, which defines a desired difference between the loudness of the primary and secondary sound data respectively. Based on the computed adjustments, a variety of actions may be performed, such as applying the adjustments to the audio signal to generate an adjusted audio signal in which the primary and secondary sound data substantially have the desired loudness difference. Further, a preview of the adjusted audio signal may be updated in real-time for display in a user interface.

摘要翻译： 描述音频响度调整技术。在一个或多个实现中，调整作为音频信号的一部分发起的主要和次要声音数据。例如，调整声音数据的响度。为此，确定表示主要和次要声音数据的声音强度的响度。然后，基于目标动态范围参数对音频信号的至少一部分进行调整，该目标动态范围参数分别定义了主要和次要声音数据的响度之间的期望差异。基于所计算的调整，可以执行各种动作，例如对音频信号应用调整以产生经调整的音频信号，其中主声音数据和辅助声音数据基本上具有期望的响度差。此外，调整后的音频信号的预览可以被实时地更新以在用户界面中显示。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类