ELECTRONIC DEVICE AND METHOD OF GENERATING TEXT-TO-SPEECH MODEL FOR PROSODY CONTROL OF THE ELECTRONIC DEVICE

    公开(公告)号:US20230335112A1

    公开(公告)日:2023-10-19

    申请号:US18213929

    申请日:2023-06-26

    CPC classification number: G10L13/10

    Abstract: According to certain embodiments, an electronic device, comprises: a memory storing therein instructions; and a processor electrically connected to the memory and configured to execute the instructions, wherein, when the instructions are executed by the processor, the processor receives training data comprising a plurality of phenomes; determines a prosody value for each one of the plurality of phenomes in the training data; clusters the plurality of phenomes based on the prosody value for each one of the plurality of phenomes in the training data, thereby resulting in a plurality of prosody clusters; extracts a phoneme sequence corresponding to a text in the training data; extracts a prosody cluster index sequence corresponding to an utterance of the text by selecting one of the plurality of clusters based on prosody values of the utterance of the text; and generates a text-to-speech (TTS) model based on the phoneme sequence and the prosody cluster index sequence.

Patent Agency Ranking