-
公开(公告)号:US11790934B2
公开(公告)日:2023-10-17
申请号:US17896752
申请日:2022-08-26
发明人: Qingshan Yao , Yu Qin , Haowen Yu , Feng Lu
IPC分类号: G10L21/0308 , G10L25/60 , G06N3/04 , G06N3/08 , G10L21/007 , G10L21/0232 , G10L25/30
CPC分类号: G10L25/60 , G06N3/04 , G06N3/08 , G10L21/007 , G10L21/0232 , G10L25/30
摘要: The present invention provides a deep learning based method and system for processing sound quality characteristics. The method comprises: obtaining data characteristics of an audio data to be processed by extracting features from user preference data including the audio data to be processed; based on the data characteristics, generating a sound quality processing result of the audio to be processed by using a trained baseline model; wherein the baseline model is a neural network model trained by using audio data behavioral data, and other relevant data from multiple users or a single user.
-
公开(公告)号:US20230267941A1
公开(公告)日:2023-08-24
申请号:US17679629
申请日:2022-02-24
IPC分类号: G10L21/007 , G10L15/06 , G10L25/57 , G10L15/22 , G06N5/02
CPC分类号: G10L21/007 , G10L15/063 , G10L25/57 , G10L15/22 , G06N5/022 , G10L2015/0635
摘要: Aspects of the disclosure relate to generating personalized accent and/or pace of speaking modulation for audio/video streams. In some embodiments, a computing platform may train an artificial intelligence model on audio or video samples associated with different geographic regions. The computing platform may receive, via a communication interface, an audio or video stream associated with a first geographic region. The computing platform may identify a second geographic region different from the first geographic region. The computing platform may transform the audio or video stream to correspond to the second geographic region different from the first geographic region. The computing platform may send, via the communication interface, the transformed audio or video stream to a user device associated with the second geographic region.
-
公开(公告)号:US11682409B2
公开(公告)日:2023-06-20
申请号:US17023941
申请日:2020-09-17
IPC分类号: G10L19/26 , G10L21/007 , G10L21/0208 , G10L21/0324 , G10L25/15 , G10L25/18 , G10L19/16 , G10L21/02 , G10L19/02 , G10L19/03 , G10L19/032 , G10L19/12 , G10L19/028 , G10L21/038 , G10L19/04
CPC分类号: G10L19/265 , G10L19/0204 , G10L19/03 , G10L19/032 , G10L19/12 , G10L19/16 , G10L19/26 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L25/15 , G10L25/18 , G10L19/02 , G10L19/028 , G10L19/04 , G10L21/038
摘要: An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.
-
公开(公告)号:US20230058447A1
公开(公告)日:2023-02-23
申请号:US17445537
申请日:2021-08-20
申请人: Google LLC
IPC分类号: G10L21/007 , G10L15/26 , G10L25/30 , G06N3/08
摘要: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.
-
公开(公告)号:US11514924B2
公开(公告)日:2022-11-29
申请号:US16797190
申请日:2020-02-21
IPC分类号: G10L21/0364 , G10L25/90 , G10L25/63 , G10L15/24 , G10L15/22 , G10L13/00 , G06V20/40 , G06V40/20 , G10L21/007 , G10L21/003 , H04H60/33
摘要: In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.
-
公开(公告)号:US11462237B2
公开(公告)日:2022-10-04
申请号:US17114349
申请日:2019-06-03
发明人: Qingshan Yao , Yu Qin , Haowen Yu , Feng Lu
IPC分类号: H04R29/00 , G10L25/60 , G06N3/04 , G06N3/08 , G10L21/007 , G10L21/0232 , G10L25/30
摘要: The present invention provides a deep learning based method and system for processing sound quality characteristics. The method comprises: obtaining data characteristics of an audio data to be processed by extracting features from user preference data including the audio data to be processed; based on the data characteristics, generating a sound quality processing result of the audio to be processed by using a trained baseline model; wherein the baseline model is a neural network model trained by using audio data behavioral data, and other relevant data from multiple users or a single user.
-
公开(公告)号:US11450332B2
公开(公告)日:2022-09-20
申请号:US16970935
申请日:2019-02-20
发明人: Hirokazu Kameoka , Takuhiro Kaneko , Ko Tanaka , Nobukatsu Hojo
IPC分类号: G10L21/007 , G10L25/03 , G10L21/013
摘要: To be able to convert to a voice of the desired attribution. Learning an encoder for, on the basis of parallel data of a sound feature vector series in a conversion-source voice signal and a latent vector series in the conversion-source voice signal, and an attribution label indicating attribution of the conversion-source voice signal, estimating a latent vector series from input of a sound feature vector series and an attribution label, and a decoder for reconfiguring the sound feature vector series from input of the latent vector series and the attribution label.
-
公开(公告)号:US20220157327A1
公开(公告)日:2022-05-19
申请号:US17532775
申请日:2021-11-22
IPC分类号: G10L19/26 , G10L19/20 , G10L19/12 , G10L19/125 , G10L21/003 , G10L19/09 , G10L21/013 , G10L19/22 , G10L21/007 , G10L19/032 , G10L19/02
摘要: In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.
-
公开(公告)号:US20220129582A1
公开(公告)日:2022-04-28
申请号:US17076896
申请日:2020-10-22
申请人: Robert Bosch GmbH
发明人: Sascha Lange
IPC分类号: G06F21/62 , G06F16/48 , G06T5/00 , G06T7/70 , G10L21/007 , G10L21/0232 , G06T11/00 , G10L25/57
摘要: A method and system are disclosed for anonymizing data for labeling and development purposes. A data storage backend has a database of non-anonymous data that is received from a data source. An anonymization engine of the data storage backend generates anonymized data by removing personally identifiable information from the non-anonymous data. These anonymized data are made available to human labelers who manually provide labels based on the anonymized data using a data labeling tool. These labels are then stored in association with the corresponding non-anonymous data, which can then be used for training one or more machine learning models. In this way, non-anonymous data having personally identifiable information can be manually labelled for development purposes without exposing the personally identifiable information to any human labelers.
-
公开(公告)号:US11183200B2
公开(公告)日:2021-11-23
申请号:US17073228
申请日:2020-10-16
IPC分类号: G10L19/00 , G10L19/26 , G10L19/20 , G10L19/12 , G10L19/125 , G10L21/003 , G10L19/09 , G10L21/013 , G10L19/22 , G10L21/007 , G10L19/032 , G10L19/02 , G10L19/107
摘要: In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.
-
-
-
-
-
-
-
-
-