专利检索 ipc:G10L21/007 第 14 页

131.

发明授权
Deep learning based method and system for processing sound quality characteristics 有权

公开(公告)号：US11790934B2

公开(公告)日：2023-10-17

申请号：US17896752

申请日：2022-08-26

申请人： Anker Innovations Technology Co., Ltd.

发明人： Qingshan Yao , Yu Qin , Haowen Yu , Feng Lu

IPC分类号： G10L21/0308 , G10L25/60 , G06N3/04 , G06N3/08 , G10L21/007 , G10L21/0232 , G10L25/30

CPC分类号： G10L25/60 , G06N3/04 , G06N3/08 , G10L21/007 , G10L21/0232 , G10L25/30

摘要： The present invention provides a deep learning based method and system for processing sound quality characteristics. The method comprises: obtaining data characteristics of an audio data to be processed by extracting features from user preference data including the audio data to be processed; based on the data characteristics, generating a sound quality processing result of the audio to be processed by using a trained baseline model; wherein the baseline model is a neural network model trained by using audio data behavioral data, and other relevant data from multiple users or a single user.

132.

发明公开
Personalized Accent and/or Pace of Speaking Modulation for Audio/Video Streams 审中-公开

公开(公告)号：US20230267941A1

公开(公告)日：2023-08-24

申请号：US17679629

申请日：2022-02-24

申请人： Bank of America Corporation

发明人： Abhishek Nagpal , Nanthakumar Veerasamy

IPC分类号： G10L21/007 , G10L15/06 , G10L25/57 , G10L15/22 , G06N5/02

CPC分类号： G10L21/007 , G10L15/063 , G10L25/57 , G10L15/22 , G06N5/022 , G10L2015/0635

摘要： Aspects of the disclosure relate to generating personalized accent and/or pace of speaking modulation for audio/video streams. In some embodiments, a computing platform may train an artificial intelligence model on audio or video samples associated with different geographic regions. The computing platform may receive, via a communication interface, an audio or video stream associated with a first geographic region. The computing platform may identify a second geographic region different from the first geographic region. The computing platform may transform the audio or video stream to correspond to the second geographic region different from the first geographic region. The computing platform may send, via the communication interface, the transformed audio or video stream to a user device associated with the second geographic region.

133.

发明授权
Audio encoder for encoding an audio signal, method for encoding an audio signal and computer program under consideration of a detected peak spectral region in an upper frequency band 有权

公开(公告)号：US11682409B2

公开(公告)日：2023-06-20

申请号：US17023941

申请日：2020-09-17

申请人： Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.

发明人： Markus Multrus , Christian Neukam , Markus Schnell , Benjamin Schubert

IPC分类号： G10L19/26 , G10L21/007 , G10L21/0208 , G10L21/0324 , G10L25/15 , G10L25/18 , G10L19/16 , G10L21/02 , G10L19/02 , G10L19/03 , G10L19/032 , G10L19/12 , G10L19/028 , G10L21/038 , G10L19/04

CPC分类号： G10L19/265 , G10L19/0204 , G10L19/03 , G10L19/032 , G10L19/12 , G10L19/16 , G10L19/26 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L25/15 , G10L25/18 , G10L19/02 , G10L19/028 , G10L19/04 , G10L21/038

摘要： An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.

134.

发明申请
Improving Speech Recognition with Speech Synthesis-based Model Adapation 有权

公开(公告)号：US20230058447A1

公开(公告)日：2023-02-23

申请号：US17445537

申请日：2021-08-20

申请人： Google LLC

发明人： Andrew Rosenberg , Bhuvana Ramabhadran

IPC分类号： G10L21/007 , G10L15/26 , G10L25/30 , G06N3/08

摘要： A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.

135.

发明授权
Dynamic creation and insertion of content 有权

公开(公告)号：US11514924B2

公开(公告)日：2022-11-29

申请号：US16797190

申请日：2020-02-21

申请人： International Business Machines Corporation

发明人： Samuel Osebe , Charles Muchiri Wachira , Komminist Weldemariam , Celia Cintas

IPC分类号： G10L21/0364 , G10L25/90 , G10L25/63 , G10L15/24 , G10L15/22 , G10L13/00 , G06V20/40 , G06V40/20 , G10L21/007 , G10L21/003 , H04H60/33

摘要： In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.

136.

发明授权
Deep learning based method and system for processing sound quality characteristics 有权

公开(公告)号：US11462237B2

公开(公告)日：2022-10-04

申请号：US17114349

申请日：2019-06-03

申请人： Anker Innovations Technology Co. Ltd.

发明人： Qingshan Yao , Yu Qin , Haowen Yu , Feng Lu

IPC分类号： H04R29/00 , G10L25/60 , G06N3/04 , G06N3/08 , G10L21/007 , G10L21/0232 , G10L25/30

摘要： The present invention provides a deep learning based method and system for processing sound quality characteristics. The method comprises: obtaining data characteristics of an audio data to be processed by extracting features from user preference data including the audio data to be processed; based on the data characteristics, generating a sound quality processing result of the audio to be processed by using a trained baseline model; wherein the baseline model is a neural network model trained by using audio data behavioral data, and other relevant data from multiple users or a single user.

137.

发明授权
Audio conversion learning device, audio conversion device, method, and program 有权

公开(公告)号：US11450332B2

公开(公告)日：2022-09-20

申请号：US16970935

申请日：2019-02-20

申请人： NIPPON TELEGRAPH AND TELEPHONE CORPORATION

发明人： Hirokazu Kameoka , Takuhiro Kaneko , Ko Tanaka , Nobukatsu Hojo

IPC分类号： G10L21/007 , G10L25/03 , G10L21/013

摘要： To be able to convert to a voice of the desired attribution. Learning an encoder for, on the basis of parallel data of a sound feature vector series in a conversion-source voice signal and a latent vector series in the conversion-source voice signal, and an attribution label indicating attribution of the conversion-source voice signal, estimating a latent vector series from input of a sound feature vector series and an attribution label, and a decoder for reconfiguring the sound feature vector series from input of the latent vector series and the attribution label.

138.

发明申请
POST FILTER FOR AUDIO SIGNALS 有权

公开(公告)号：US20220157327A1

公开(公告)日：2022-05-19

申请号：US17532775

申请日：2021-11-22

申请人： DOLBY INTERNATIONAL AB

发明人： Barbara RESCH , Kristofer KJÖRLING , Lars VILLEMOES

IPC分类号： G10L19/26 , G10L19/20 , G10L19/12 , G10L19/125 , G10L21/003 , G10L19/09 , G10L21/013 , G10L19/22 , G10L21/007 , G10L19/032 , G10L19/02

摘要： In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.

139.

发明申请
DATA ANONYMIZATION FOR DATA LABELING AND DEVELOPMENT PURPOSES 有权

公开(公告)号：US20220129582A1

公开(公告)日：2022-04-28

申请号：US17076896

申请日：2020-10-22

申请人： Robert Bosch GmbH

发明人： Sascha Lange

IPC分类号： G06F21/62 , G06F16/48 , G06T5/00 , G06T7/70 , G10L21/007 , G10L21/0232 , G06T11/00 , G10L25/57

摘要： A method and system are disclosed for anonymizing data for labeling and development purposes. A data storage backend has a database of non-anonymous data that is received from a data source. An anonymization engine of the data storage backend generates anonymized data by removing personally identifiable information from the non-anonymous data. These anonymized data are made available to human labelers who manually provide labels based on the anonymized data using a data labeling tool. These labels are then stored in association with the corresponding non-anonymous data, which can then be used for training one or more machine learning models. In this way, non-anonymous data having personally identifiable information can be manually labelled for development purposes without exposing the personally identifiable information to any human labelers.

140.

发明授权
Post filter for audio signals 有权

公开(公告)号：US11183200B2

公开(公告)日：2021-11-23

申请号：US17073228

申请日：2020-10-16

申请人： DOLBY INTERNATIONAL AB

发明人： Barbara Resch , Kristofer Kjörling , Lars Villemoes

IPC分类号： G10L19/00 , G10L19/26 , G10L19/20 , G10L19/12 , G10L19/125 , G10L21/003 , G10L19/09 , G10L21/013 , G10L19/22 , G10L21/007 , G10L19/032 , G10L19/02 , G10L19/107

摘要： In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类