Patent search ap:("TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED") AND inv:"Dan Su" Page 1

1.

发明授权
Voice synthesis method, model training method, device and computer device 有权

公开(公告)号：US12014720B2

公开(公告)日：2024-06-18

申请号：US16999989

申请日：2020-08-21

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Xixin Wu , Mu Wang , Shiyin Kang , Dan Su , Dong Yu

IPC: G10L13/00 , G10L19/02

CPC classification number: G10L13/00 , G10L19/02

Abstract: This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be-processed linguistic data; encoding the linguistic data, to obtain encoded linguistic data; obtaining an embedded vector for speech feature conversion, the embedded vector being generated according to a residual between synthesized reference speech data and reference speech data that correspond to the same reference linguistic data; and decoding the encoded linguistic data according to the embedded vector, to obtain target synthesized speech data on which the speech feature conversion is performed. The solution provided in this application can prevent quality of a synthesized speech from being affected by a semantic feature in a mel-frequency cepstrum.

2.

发明授权
Speech separation model training method and apparatus, storage medium and computer device 有权

公开(公告)号：US11908455B2

公开(公告)日：2024-02-20

申请号：US17672565

申请日：2022-02-15

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Jun Wang , Wingyip Lam , Dan Su , Dong Yu

IPC: G10L15/06 , G10L15/05 , G10L15/16

CPC classification number: G10L15/063 , G10L15/05 , G10L15/16

Abstract: A speech separation model training method and apparatus, a computer-readable storage medium, and a computer device are provided, the method including: obtaining first audio and second audio, the first audio including target audio and having corresponding labeled audio, and the second audio including noise audio. obtaining an encoding model, an extraction model, and an initial estimation model; performing unsupervised training on the encoding model, the extraction model, and the estimation model according to the second audio, and adjusting model parameters of the extraction model and the estimation model; performing supervised training on the encoding model and the extraction model according to the first audio and the labeled audio corresponding to the first audio, and adjusting a model parameter of the encoding model; continuously performing the unsupervised training and the supervised training, so that the unsupervised training and the supervised training overlap, and the training is not finished until a training stop condition is met.

3.

发明申请
ARTIFICIAL INTELLIGENCE-BASED WAKEUP WORD DETECTION METHOD AND APPARATUS, DEVICE, AND MEDIUM 有权

公开(公告)号：US20220013111A1

公开(公告)日：2022-01-13

申请号：US17483617

申请日：2021-09-23

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Jie Chen , Dan Su , Mingjie Jin , Zhenling Zhu

IPC: G10L15/16 , G10L15/22 , G10L15/02

Abstract: This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers; determine a target probability vector from the posterior probability vectors according to the syllable combination sequence; and calculate a confidence according to the target probability vector, and determine that the speech frames include the wakeup word text when the confidence is greater than or equal to a threshold.

4.

发明申请
SPEECH SIGNAL PROCESSING MODEL TRAINING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20200051549A1

公开(公告)日：2020-02-13

申请号：US16655548

申请日：2019-10-17

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Lianwu Chen , Meng Yu , Min Luo , Dan Su

IPC: G10L15/06 , G10L15/16 , G10L15/22 , G10L15/183 , G06N3/08 , G06N3/04

Abstract: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.

5.

发明授权
Training method and device for audio separation network, audio separation method and device, and medium 有权

公开(公告)号：US12223969B2

公开(公告)日：2025-02-11

申请号：US17682399

申请日：2022-02-28

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Jun Wang , Wing Yip Lam , Dan Su , Dong Yu

IPC: G10L19/06 , G06N3/045 , G06N3/088 , G10L25/30

Abstract: A method of training an audio separation network is provided. The method includes obtaining a first separation sample set, the first separation sample set including at least two types of audio with dummy labels, obtaining a first sample set by performing interpolation on the first separation sample set based on perturbation data, obtaining a second separation sample set by separating the first sample set using an unsupervised network, determining losses of second separation samples in the second separation sample set, and adjusting network parameters of the unsupervised network based on the losses of the second separation samples, such that a first loss of a first separation result outputted by an adjusted unsupervised network meets a convergence condition.

6.

发明授权
Speech recognition method and apparatus, and method and apparatus for training speech recognition model 有权

公开(公告)号：US11798531B2

公开(公告)日：2023-10-24

申请号：US17077141

申请日：2020-10-22

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Jun Wang , Dan Su , Dong Yu

IPC: G10L15/02 , G10L15/06

CPC classification number: G10L15/02 , G10L15/063

Abstract: A speech recognition method, a speech recognition apparatus, and a method and an apparatus for training a speech recognition model are provided. The speech recognition method includes: recognizing a target word speech from a hybrid speech, and obtaining, as an anchor extraction feature of a target speech, an anchor extraction feature of the target word speech based on the target word speech; obtaining a mask of the target speech according to the anchor extraction feature of the target speech; and recognizing the target speech according to the mask of the target speech.

7.

发明授权
Multi-person speech separation method and apparatus using a generative adversarial network model 有权

公开(公告)号：US11450337B2

公开(公告)日：2022-09-20

申请号：US17023829

申请日：2020-09-17

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Lianwu Chen , Meng Yu , Yanmin Qian , Dan Su , Dong Yu

IPC: G10L21/0272 , G06N3/04 , G06N3/08 , G10L25/30 , G10L25/51

Abstract: A multi-person speech separation method is provided for a terminal. The method includes extracting a hybrid speech feature from a hybrid speech signal requiring separation, N human voices being mixed in the hybrid speech signal, N being a positive integer greater than or equal to 2; extracting a masking coefficient of the hybrid speech feature by using a generative adversarial network (GAN) model, to obtain a masking matrix corresponding to the N human voices, wherein the GAN model comprises a generative network model and an adversarial network model; and performing a speech separation on the masking matrix corresponding to the N human voices and the hybrid speech signal by using the GAN model, and outputting N separated speech signals corresponding to the N human voices.

8.

发明申请
INTER-CHANNEL FEATURE EXTRACTION METHOD, AUDIO SEPARATION METHOD AND APPARATUS, AND COMPUTING DEVICE 有权

公开(公告)号：US20210375294A1

公开(公告)日：2021-12-02

申请号：US17401125

申请日：2021-08-12

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Rongzhi Gu , Shixiong Zhang , Lianwu Chen , Yong Xu , Meng Yu , Dan Su , Dong Yu

IPC: G10L19/008 , G10L25/30 , G10L25/03

Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.

9.

发明授权
Data processing method based on simultaneous interpretation, computer device, and storage medium 有权

公开(公告)号：US12087290B2

公开(公告)日：2024-09-10

申请号：US16941503

申请日：2020-07-28

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Jingliang Bai , Caisheng Ouyang , Haikang Liu , Lianwu Chen , Qi Chen , Yulu Zhang , Min Luo , Dan Su

IPC: G10L15/183 , G10L15/06 , G10L15/22 , G10L15/30 , G10L21/0232 , G10L25/21 , G10L25/84 , G10L25/78

CPC classification number: G10L15/183 , G10L15/063 , G10L15/22 , G10L15/30 , G10L21/0232 , G10L25/21 , G10L25/84 , G10L2015/0636 , G10L2025/783

Abstract: A data processing method based on simultaneous interpretation, applied to a server in a simultaneous interpretation system, including: obtaining audio transmitted by a simultaneous interpretation device; processing the audio by using a simultaneous interpretation model to obtain an initial text; transmitting the initial text to a user terminal; receiving a modified text fed back by the user terminal, the modified text being obtained after the user terminal modifies the initial text; and updating the simultaneous interpretation model according to the initial text and the modified text.

10.

发明公开
AUDIO PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT 审中-公开

公开(公告)号：US20240265929A1

公开(公告)日：2024-08-08

申请号：US18640724

申请日：2024-04-19

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Meng WANG , Shan Yang , Qingbo Huang , Yuyong Kang , Yupeng Shi , Wei Xiao , Shidong Shang , Dan Su

IPC: G10L19/032 , G10L19/02

CPC classification number: G10L19/032 , G10L19/0204

Abstract: An audio processing method and apparatus, including decomposing an audio signal into a low-frequency subband signal and a high-frequency subband signal, obtaining a low-frequency feature of the low-frequency subband signal, obtaining a high-frequency feature of the high-frequency subband signal, feature dimensionality of the high-frequency feature being lower than feature dimensionality of the low-frequency feature, performing quantization encoding on the low-frequency feature to obtain a low-frequency bitstream of the audio signal, and performing quantization encoding on the high-frequency feature to obtain a high-frequency bitstream of the audio signal.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification