Voice conversion learning device, voice conversion device, method, and program

    公开(公告)号:US11900957B2

    公开(公告)日:2024-02-13

    申请号:US17251711

    申请日:2019-06-13

    发明人: Hirokazu Kameoka

    摘要: To be able to convert to a voice of the desired attribution. The present invention includes learning, on the basis of parallel data of a sound feature value series in a conversion-source voice signal and a latent variable series in the conversion-source voice signal, and an attribution code indicating attribution of the conversion-source voice signal, an encoder for estimating a latent variable series from input of a sound feature value series and an attribution code, and a decoder for reconfiguring the sound feature value series from input of the latent variable series and the attribution code, to maximize a value of an objective function, the objective function being represented using attribution code similarity of a sound feature value series reconfigured by the decoder from input of an error between the sound feature value series reconfigured by the decoder and the sound feature value series in the conversion-source voice signal in the parallel data, a distance between the latent variable series estimated by the encoder and the latent variable series in the conversion-source voice signal in the parallel data, and any attribution code, the attribution code similarity being similarity to the any attribution code identified by an attribution identifier.

    CONVERSION LEARNING APPARATUS, CONVERSION LEARNING METHOD, CONVERSION LEARNING PROGRAM AND CONVERSION APPARATUS

    公开(公告)号:US20230138232A1

    公开(公告)日:2023-05-04

    申请号:US17794227

    申请日:2020-01-30

    摘要: A conversion learning device includes: a source encoding unit that converts, by using a first machine learning model, a feature amount sequence of a source domain that is a characteristic of conversion-source content data, into a first internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the source domain are arranged; a target encoding unit that converts, by using a second machine learning model, a feature amount sequence of a target domain that is a characteristic of conversion-target content data, into a second internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the target domain are arranged; an attention matrix calculation unit that calculates, by using the first internal representation vector sequence and the second internal representation vector sequence, an attention matrix that is a matrix mapping the individual locations of the feature amount sequence of the source domain to the individual locations of the feature amount sequence of the target domain, and calculates a third internal representation vector sequence that is a product of an internal representation vector sequence calculated by linear conversion of the first internal representation vector sequence and the attention matrix; a target decoding unit that calculates, by using the third internal representation vector sequence, a feature amount sequence of a conversion domain that is used to convert the source domain into the conversion domain, by using a third machine learning model; and a learning execution unit that causes at least one of the target encoding unit and the target decoding unit to learn such that a distance between a submatrix of the feature amount sequence of the target domain and a submatrix of the feature amount sequence of the conversion domain becomes shorter.

    ADJUSTMENT METHOD OF SOUND OUTPUT AND ELECTRONIC DEVICE PERFORMING THE SAME

    公开(公告)号:US20220084533A1

    公开(公告)日:2022-03-17

    申请号:US17023435

    申请日:2020-09-17

    摘要: An adjustment method of sound output is disclosed. The adjustment method includes the following steps of: receiving an audio message having a vowel message; determining whether the audio message is a whispered voice message; if the audio message is a whispered voice message, outputting a normal voice message, wherein the spoken content of the normal voice message is the same as that of the audio message, and the normal voice message has a normal voice vowel message, wherein the sound energy of the low-frequency part of the normal voice vowel message is 1.5-1,000,000 times that of the vowel message.

    SIGNAL PROCESSING APPARATUS AND METHOD, TRAINING APPARATUS AND METHOD, AND PROGRAM

    公开(公告)号:US20210225383A1

    公开(公告)日:2021-07-22

    申请号:US16769122

    申请日:2018-11-28

    申请人: SONY CORPORATION

    发明人: NAOYA TAKAHASHI

    IPC分类号: G10L21/007 G10L21/028

    摘要: The present technology relates to a signal processing apparatus and method, a training apparatus and method, and a program that enable easier voice quality conversion. A signal processing apparatus includes: a voice quality conversion unit configured to convert acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on the basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data. The present technology can be applied to a voice quality conversion apparatus.