SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Invention Publication

US20230368804A1 SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS 审中-公开

Please log in to see more content

Patent Title: SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS
Application No.: US18144413

Application Date: 2023-05-08
Publication No.: US20230368804A1

Publication Date: 2023-11-16
Inventor: Willem Bastiaan Kleijn , Jan K. Skoglund , Alejandro Luebs , Sze Chie Lim
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Main IPC: G10L19/02
IPC: G10L19/02 ; G10L25/30

SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

Public/Granted literature

US12062380B2 Speech coding using auto-regressive generative neural networks Public/Granted day:2024-08-13

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L19/00	用于冗余度下降情形（例如在声码器中）的语音或音频信号分析-合成技术；语音或音频信号编码或解码，采用源滤波器模型或心理声学分析（乐器中的入G10H）
G10L19/02	.利用频谱分析，例如变换声码器或子频带声码器