-
公开(公告)号:US20210280202A1
公开(公告)日:2021-09-09
申请号:US17330126
申请日:2021-05-25
Inventor: Xilei WANG , Wersfu WANG , Tao SUN
IPC: G10L21/013 , G10L13/033 , G10L15/02 , G10L25/30 , G06N3/08
Abstract: The disclosure provides a voice conversion method, a voice conversion apparatus, an electronic device, and a storage medium, related to the field of voice conversion, speech interaction, natural language processing, and deep learning. The method includes: acquiring a source speech of a first user and a reference speech of a second user; extracting first speech content information and a first acoustic feature from the source speech; extracting a second acoustic feature from the reference speech; acquiring a reconstructed third acoustic feature by inputting the first speech content information, the first acoustic feature, and the second acoustic feature into a pre-trained voice conversion model, in which the pre-trained voice conversion model is acquired by training based on speeches of a third user; and synthesizing a target speech based on the third acoustic feature.
-
2.
公开(公告)号:US20210201887A1
公开(公告)日:2021-07-01
申请号:US17205121
申请日:2021-03-18
Inventor: Zhijie CHEN , Tao SUN , Lei JIA
IPC: G10L13/047 , G10L25/18 , G10L25/30
Abstract: The present application discloses a method and an apparatus for training a speech spectrum generation model, as well as an electronic device, and relates to the technical field of speech synthesis and deep learning. A specific implementation is as follows: inputting a first text sequence into the speech spectrum generation model to generate an analog spectrum sequence corresponding to the first text sequence, and obtain a first loss value of the analog spectrum sequence according to a preset loss function; inputting the analog spectrum sequence corresponding to the first text sequence into an adversarial loss function model, which is a generative adversarial network model, to obtain a second loss value of the analog spectrum sequence; and training the speech spectrum generation model based on the first loss value and the second loss value.
-
公开(公告)号:US20210390943A1
公开(公告)日:2021-12-16
申请号:US17111238
申请日:2020-12-03
Inventor: Zhengkun GAO , Junteng ZHANG , Wenfu WANG , Tao SUN
IPC: G10L13/047 , G10L13/10 , G10L13/06
Abstract: The present disclosure discloses a method and apparatus for training a model, a method and apparatus for synthesizing a speech, a device and a storage medium, and relates to the field of natural language processing and deep learning technology. The method for training a model may include: determining a phoneme feature and a prosodic word boundary feature of sample text data; inserting a pause character into the phoneme feature according to the prosodic word boundary feature to obtain a combined feature of the sample text data; and training an initial speech synthesis model according to the combined feature of the sample text data, to obtain a target speech synthesis model.
-
-