Sound processing method
    3.
    发明授权

    公开(公告)号:US11996115B2

    公开(公告)日:2024-05-28

    申请号:US17435761

    申请日:2019-12-18

    Inventor: Mitsuru Sendoda

    CPC classification number: G10L25/24 G10L25/51

    Abstract: A sound processing apparatus includes a feature value extractor configured to perform a Fourier transform and then a cepstral analysis of a sound signal and to extract, as feature values of the sound signal, values including frequency components obtained by the Fourier transform of the sound signal and a value based on a result obtained by the cepstral analysis of the sound signal.

    Voice conversion system and training method therefor

    公开(公告)号:US11875775B2

    公开(公告)日:2024-01-16

    申请号:US17430793

    申请日:2021-04-20

    CPC classification number: G10L15/063 G10L15/16 G10L25/24

    Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent. Pronunciation accuracy of audio obtained by performing voice conversion by the bottleneck feature is obviously higher than that of a phonetic posteriorGram based method, and timbre is not significantly different. By means of a transfer learning mode, dependence on training corpus can be greatly reduced.

Patent Agency Ranking