Method and apparatus for training voiceprint recognition system

    公开(公告)号:US10854207B2

    公开(公告)日:2020-12-01

    申请号:US16231913

    申请日:2018-12-24

    Abstract: A method and an apparatus for training a voiceprint recognition system are provided. The method includes obtaining a voice training data set comprising voice segments of users; determining identity vectors of all the voice segments; identifying identity vectors of voice segments of a same user in the determined identity vectors; placing the recognized identity vectors of the same user in the users into one of user categories; and determining an identity vector in the user category as a first identity vector. The method further includes normalizing the first identity vector by using a normalization matrix, a first value being a sum of similarity degrees between the first identity vector in the corresponding category and other identity vectors in the corresponding category; training the normalization matrix, and outputting a training value of the normalization matrix when the normalization matrix maximizes a sum of first values of all the user categories.

    Audio data processing method and apparatus

    公开(公告)号:US10770050B2

    公开(公告)日:2020-09-08

    申请号:US15775460

    申请日:2017-06-02

    Abstract: An audio data processing method and apparatus are provided. The method includes obtaining audio data. An overall spectrum of the audio data is obtained and separated into a singing voice spectrum and an accompaniment spectrum. An accompaniment binary mask of the audio data is calculated according to the audio data. The singing voice spectrum and the accompaniment spectrum are processed using the accompaniment binary mask, to obtain accompaniment data and singing voice data.

    Identity vector processing method and computer device

    公开(公告)号:US10650830B2

    公开(公告)日:2020-05-12

    申请号:US15954416

    申请日:2018-04-16

    Abstract: Processing circuitry of an information processing apparatus obtains a set of identity vectors that are calculated according to voice samples from speakers. The identity vectors are classified into speaker classes respectively corresponding to the speakers. The processing circuitry selects, from the identity vectors, first subsets of interclass neighboring identity vectors respectively corresponding to the identity vectors and second subsets of intraclass neighboring identity vectors respectively corresponding to the identity vectors. The processing circuitry determines an interclass difference based on the first subsets of interclass neighboring identity vectors and the corresponding identity vectors; and determines an intraclass difference based on the second subsets of intraclass neighboring identify vectors and the corresponding identity vectors. Further, the processing circuitry determines a set of basis vectors to maximize a projection of the interclass difference on the basis vectors and to minimize a projection of the intraclass difference on the basis vectors.

    IDENTITY VECTOR PROCESSING METHOD AND COMPUTER DEVICE

    公开(公告)号:US20180233151A1

    公开(公告)日:2018-08-16

    申请号:US15954416

    申请日:2018-04-16

    CPC classification number: G10L17/20 G10L17/02 G10L17/08

    Abstract: Processing circuitry of an information processing apparatus obtains a set of identity vectors that are calculated according to voice samples from speakers. The identity vectors are classified into speaker classes respectively corresponding to the speakers. The processing circuitry selects, from the identity vectors, first subsets of interclass neighboring identity vectors respectively corresponding to the identity vectors and second subsets of intraclass neighboring identity vectors respectively corresponding to the identity vectors. The processing circuitry determines an interclass difference based on the first subsets of interclass neighboring identity vectors and the corresponding identity vectors; and determines an intraclass difference based on the second subsets of intraclass neighboring identify vectors and the corresponding identity vectors. Further, the processing circuitry determines a set of basis vectors to maximize a projection of the interclass difference on the basis vectors and to minimize a projection of the intraclass difference on the basis vectors.

    Statistical parameter model establishing method, speech synthesis method, server and storage medium

    公开(公告)号:US11289069B2

    公开(公告)日:2022-03-29

    申请号:US16365458

    申请日:2019-03-26

    Abstract: A statistical parameter modeling method is performed by a server. After obtaining model training data, the model training data including a text feature sequence and a corresponding original speech sample sequence, the server inputs an original vector matrix formed by matching a text feature sample point in the text feature sample sequence with a speech sample point in the original speech sample sequence into a statistical parameter model for training and then performs non-linear mapping calculation on the original vector matrix in a hidden layer, to output a corresponding prediction speech sample point. The server then obtains a model parameter of the statistical parameter model according to the prediction speech sample point and a corresponding original speech sample point by using a smallest difference principle, to obtain a corresponding target statistical parameter model.

Patent Agency Ranking