- 专利标题: Text-to-speech synthesis method and system, a method of training a text-to-speech synthesis system, and a method of calculating an expressivity score
-
申请号: US17785810申请日: 2020-12-17
-
公开(公告)号: US12046226B2公开(公告)日: 2024-07-23
- 发明人: John Flynn , Zeenat Qureshi
- 申请人: Spotify AB
- 申请人地址: SE Stockholm
- 专利权人: Spotify AB
- 当前专利权人: Spotify AB
- 当前专利权人地址: SE Stockholm
- 代理机构: Morgan, Lewis & Bockius LLP
- 优先权: GB 19101 2019.12.20
- 国际申请: PCT/GB2020/053266 2020.12.17
- 国际公布: WO2021/123792A 2021.06.24
- 进入国家日期: 2022-06-15
- 主分类号: G10L15/00
- IPC分类号: G10L15/00 ; G10L13/047 ; G10L25/30 ; G10L25/63
摘要:
A text-to-speech synthesis method comprising: receiving text; inputting the received text in a prediction network; and generating speech data, wherein the prediction network comprises a neural network, and wherein the neural network is trained by: receiving a first training dataset comprising audio data and corresponding text data; acquiring an expressivity score for each audio sample of the audio data, wherein the expressivity score is a quantitative representation of how well an audio sample conveys emotional information and sounds natural, realistic and human-like; training the neural network using a first sub-dataset, and further training the neural network using a second sub-dataset, wherein the first sub-dataset and the second sub-dataset comprise audio samples and corresponding text from the first training dataset and wherein the average expressivity score of the audio data in the second sub-dataset is higher than the average expressivity score of the audio data in the first sub-dataset.
公开/授权文献
信息查询