-
公开(公告)号:US20240274145A1
公开(公告)日:2024-08-15
申请号:US18643296
申请日:2024-04-23
IPC分类号: G10L19/26 , G10L19/02 , G10L19/032 , G10L19/09 , G10L19/107 , G10L19/12 , G10L19/125 , G10L19/20 , G10L19/22 , G10L21/003 , G10L21/007 , G10L21/013
CPC分类号: G10L19/26 , G10L19/02 , G10L19/032 , G10L19/09 , G10L19/12 , G10L19/125 , G10L19/20 , G10L19/22 , G10L19/265 , G10L21/003 , G10L21/007 , G10L21/013 , G10L19/0212 , G10L19/107
摘要: In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.
-
公开(公告)号:US12014747B2
公开(公告)日:2024-06-18
申请号:US18308293
申请日:2023-04-27
IPC分类号: G10L19/26 , G10L19/02 , G10L19/028 , G10L19/03 , G10L19/032 , G10L19/04 , G10L19/12 , G10L19/16 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L21/038 , G10L25/15 , G10L25/18
CPC分类号: G10L19/265 , G10L19/0204 , G10L19/03 , G10L19/032 , G10L19/12 , G10L19/16 , G10L19/26 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L25/15 , G10L25/18 , G10L19/02 , G10L19/028 , G10L19/04 , G10L21/038
摘要: An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.
-
公开(公告)号:US11900957B2
公开(公告)日:2024-02-13
申请号:US17251711
申请日:2019-06-13
发明人: Hirokazu Kameoka
IPC分类号: G10L21/007 , G06N3/08 , G10L19/008
CPC分类号: G10L21/007 , G06N3/08 , G10L19/008
摘要: To be able to convert to a voice of the desired attribution. The present invention includes learning, on the basis of parallel data of a sound feature value series in a conversion-source voice signal and a latent variable series in the conversion-source voice signal, and an attribution code indicating attribution of the conversion-source voice signal, an encoder for estimating a latent variable series from input of a sound feature value series and an attribution code, and a decoder for reconfiguring the sound feature value series from input of the latent variable series and the attribution code, to maximize a value of an objective function, the objective function being represented using attribution code similarity of a sound feature value series reconfigured by the decoder from input of an error between the sound feature value series reconfigured by the decoder and the sound feature value series in the conversion-source voice signal in the parallel data, a distance between the latent variable series estimated by the encoder and the latent variable series in the conversion-source voice signal in the parallel data, and any attribution code, the attribution code similarity being similarity to the any attribution code identified by an attribution identifier.
-
公开(公告)号:US11894009B2
公开(公告)日:2024-02-06
申请号:US17587243
申请日:2022-01-28
发明人: Liujun Zhang , Yuqing Hua , Zhen Yang , Zuojing Li
IPC分类号: G10L21/003 , G10L21/007 , G10L21/057 , G10L21/013
CPC分类号: G10L21/007 , G10L21/057 , G10L2021/0135
摘要: An audio processing method applied to a first terminal is described, and includes: in response to receiving of audio data input by a user at the first terminal, and determination that a voice change function is turned on, determining change parameters; and based on the change parameters, performing change processing on the audio data.
-
公开(公告)号:US11894008B2
公开(公告)日:2024-02-06
申请号:US16769122
申请日:2018-11-28
申请人: SONY CORPORATION
发明人: Naoya Takahashi
IPC分类号: G10L21/00 , G10L25/00 , G10L21/007 , G10L21/028 , G10L21/013
CPC分类号: G10L21/007 , G10L21/013 , G10L21/028
摘要: Provided is a signal processing apparatus that includes a voice quality conversion unit that converts acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on the basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data.
-
公开(公告)号:US20230138232A1
公开(公告)日:2023-05-04
申请号:US17794227
申请日:2020-01-30
发明人: Hirokazu KAMEOKA , Ko TANAKA , Takuhiro KANEKO , Nobukatsu HOJO
IPC分类号: G10L21/007 , G10L25/30 , G06N20/20 , G06F17/16
摘要: A conversion learning device includes: a source encoding unit that converts, by using a first machine learning model, a feature amount sequence of a source domain that is a characteristic of conversion-source content data, into a first internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the source domain are arranged; a target encoding unit that converts, by using a second machine learning model, a feature amount sequence of a target domain that is a characteristic of conversion-target content data, into a second internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the target domain are arranged; an attention matrix calculation unit that calculates, by using the first internal representation vector sequence and the second internal representation vector sequence, an attention matrix that is a matrix mapping the individual locations of the feature amount sequence of the source domain to the individual locations of the feature amount sequence of the target domain, and calculates a third internal representation vector sequence that is a product of an internal representation vector sequence calculated by linear conversion of the first internal representation vector sequence and the attention matrix; a target decoding unit that calculates, by using the third internal representation vector sequence, a feature amount sequence of a conversion domain that is used to convert the source domain into the conversion domain, by using a third machine learning model; and a learning execution unit that causes at least one of the target encoding unit and the target decoding unit to learn such that a distance between a submatrix of the feature amount sequence of the target domain and a submatrix of the feature amount sequence of the conversion domain becomes shorter.
-
公开(公告)号:US20230056955A1
公开(公告)日:2023-02-23
申请号:US17896752
申请日:2022-08-26
发明人: Qingshan Yao , Yu Qin , Haowen Yu , Feng Lu
IPC分类号: G10L25/60 , G06N3/04 , G06N3/08 , G10L21/007 , G10L21/0232 , G10L25/30
摘要: The present invention provides a deep learning based method and system for processing sound quality characteristics. The method comprises: obtaining data characteristics of an audio data to be processed by extracting features from user preference data including the audio data to be processed; based on the data characteristics, generating a sound quality processing result of the audio to be processed by using a trained baseline model; wherein the baseline model is a neural network model trained by using audio data behavioral data, and other relevant data from multiple users or a single user.
-
公开(公告)号:US20220270627A1
公开(公告)日:2022-08-25
申请号:US17353636
申请日:2021-06-21
发明人: Na XU , Yongtao JIA , Linzhang WANG
IPC分类号: G10L21/007 , G10L21/0272 , G10L25/30 , G10L25/51
摘要: The present disclosure relates to a method and an apparatus for audio processing and a storage medium. The method includes: obtaining an audio mixing feature of a target object, in which the audio mixing feature at least includes: a voiceprint feature and a pitch feature of the target object; and determining a target audio matching with the target object in the mixed audio according to the audio mixing feature.
-
公开(公告)号:US20220084533A1
公开(公告)日:2022-03-17
申请号:US17023435
申请日:2020-09-17
申请人: PixArt Imaging Inc.
发明人: Kuan-Li CHAO , Wei-Ren LAN , Hung LIN , Kuo-Ping YANG
IPC分类号: G10L21/007 , G10L25/78 , G10L15/22
摘要: An adjustment method of sound output is disclosed. The adjustment method includes the following steps of: receiving an audio message having a vowel message; determining whether the audio message is a whispered voice message; if the audio message is a whispered voice message, outputting a normal voice message, wherein the spoken content of the normal voice message is the same as that of the audio message, and the normal voice message has a normal voice vowel message, wherein the sound energy of the low-frequency part of the normal voice vowel message is 1.5-1,000,000 times that of the vowel message.
-
公开(公告)号:US20210225383A1
公开(公告)日:2021-07-22
申请号:US16769122
申请日:2018-11-28
申请人: SONY CORPORATION
发明人: NAOYA TAKAHASHI
IPC分类号: G10L21/007 , G10L21/028
摘要: The present technology relates to a signal processing apparatus and method, a training apparatus and method, and a program that enable easier voice quality conversion. A signal processing apparatus includes: a voice quality conversion unit configured to convert acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on the basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data. The present technology can be applied to a voice quality conversion apparatus.
-
-
-
-
-
-
-
-
-