Text-to-speech synthesis method and system, a method of training a text-to-speech synthesis system, and a method of calculating an expressivity score

发明授权

US12046226B2 Text-to-speech synthesis method and system, a method of training a text-to-speech synthesis system, and a method of calculating an expressivity score 有权

请登陆查看更多内容

专利标题： Text-to-speech synthesis method and system, a method of training a text-to-speech synthesis system, and a method of calculating an expressivity score
申请号： US17785810

申请日： 2020-12-17
公开(公告)号： US12046226B2

公开(公告)日： 2024-07-23
发明人: John Flynn , Zeenat Qureshi
申请人： Spotify AB
申请人地址： SE Stockholm
专利权人： Spotify AB
当前专利权人： Spotify AB
当前专利权人地址： SE Stockholm
代理机构： Morgan, Lewis & Bockius LLP
优先权： GB 19101 2019.12.20
国际申请： PCT/GB2020/053266 2020.12.17
国际公布： WO2021/123792A 2021.06.24
进入国家日期： 2022-06-15
主分类号： G10L15/00
IPC分类号： G10L15/00 ; G10L13/047 ; G10L25/30 ; G10L25/63

Text-to-speech synthesis method and system, a method of training a text-to-speech synthesis system, and a method of calculating an expressivity score

摘要：

A text-to-speech synthesis method comprising: receiving text; inputting the received text in a prediction network; and generating speech data, wherein the prediction network comprises a neural network, and wherein the neural network is trained by: receiving a first training dataset comprising audio data and corresponding text data; acquiring an expressivity score for each audio sample of the audio data, wherein the expressivity score is a quantitative representation of how well an audio sample conveys emotional information and sounds natural, realistic and human-like; training the neural network using a first sub-dataset, and further training the neural network using a second sub-dataset, wherein the first sub-dataset and the second sub-dataset comprise audio samples and corresponding text from the first training dataset and wherein the average expressivity score of the audio data in the second sub-dataset is higher than the average expressivity score of the audio data in the first sub-dataset.

公开/授权文献

US20230036020A1 Text-to-Speech Synthesis Method and System, a Method of Training a Text-to-Speech Synthesis System, and a Method of Calculating an Expressivity Score 公开/授权日：2023-02-02

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）