SPEECH RECOGNITION APPARATUS, METHOD AND PROGRAM

    公开(公告)号:US20230050795A1

    公开(公告)日:2023-02-16

    申请号:US17793000

    申请日:2020-01-16

    摘要: A score integration unit 7 obtains a new score Score (l1:nb, c) that integrates a score Score (l1:nb, c) and a score Score (w1:ob, c). This new score Score (l1:nb, c) becomes a score Score (l1:nb) in a hypothesis selection unit 8. Thus, the score Score (l1:nb) can be said to take into account the score Score (w1:ob, c). In a speech recognition apparatus, first information is extracted on the basis of the score Score (l1:nb) taking into account the score Score (w1:ob, c). Thus, speech recognition with higher performance than that in the related art can be achieved.

    ACCOUSTIC MODEL LEARNING APPARATUS, ACCOUSTIC MODEL LEARNING METHOD, AND PROGRAM

    公开(公告)号:US20220122626A1

    公开(公告)日:2022-04-21

    申请号:US17428274

    申请日:2020-01-23

    IPC分类号: G10L25/30 G10L25/78 G06N3/08

    摘要: Provided is a technology of learning an acoustic model with a certain degree of accuracy of sound recognition within a short calculation period. An acoustic model learning device includes: a loss calculation unit configured to calculate a loss of sound data which is an element of the corpus Cj for learning by using an acoustic model; a curriculum corpus generation unit configured to generate a curriculum corpus being a union of subsets of the corpuses Cj for learning, the corpuses Cj including, as elements, sound data for which the loss falls within a predetermined range indicating a small value; an acoustic model update unit configured to update the acoustic model by using the curriculum corpus; and a first end condition determination unit configured to output the acoustic model when a predetermined end condition is satisfied, or transfer execution control to the loss calculation unit when the predetermined end condition is not satisfied, and the acoustic model update unit is configured to update the acoustic model by giving a weight to a gradient for sound data which is an element of the curriculum corpus using such a weight for sound data as to have a smaller value as a number of times the sound data has been selected as an element of the curriculum corpus becomes larger.

    MODEL LEARNING APPARATUS, VOICE RECOGNITION APPARATUS, METHOD AND PROGRAM THEREOF

    公开(公告)号:US20230009370A1

    公开(公告)日:2023-01-12

    申请号:US17783230

    申请日:2019-12-09

    IPC分类号: G10L15/16 G10L15/02 G10L15/06

    摘要: A probability matrix P is obtained on the basis of an acoustic feature amount sequence, the probability matrix P being the sum for all symbols cn of the product of an output probability distribution vector zn having an element corresponding to the appearance probability of each entry k of the n-th symbol cn for the acoustic feature amount sequence and an attention weight vector αn having an element corresponding to an attention weight representing the degree of relevance of each frame t of the acoustic feature amount sequence with respect to a timing at which the symbol cn appears; a label sequence corresponding to the acoustic feature amount sequence in a case where a model parameter is provided is obtained; a CTC loss of the label sequence for a symbol sequence corresponding to the acoustic feature amount sequence is obtained using the symbol sequence and the label sequence; a KLD loss of the label sequence for a matrix corresponding to the probability matrix P is obtained using the matrix corresponding to the probability matrix P and the label sequence; and the model parameter is updated on the basis of an integrated loss obtained by integrating the CTC loss and the KLD loss, and the processing is repeated until an end condition is satisfied.

    SPEECH RECOGNITION CONTROL APPARATUS, SPEECH RECOGNITION CONTROL METHOD, AND PROGRAM

    公开(公告)号:US20220328047A1

    公开(公告)日:2022-10-13

    申请号:US17615812

    申请日:2019-06-04

    IPC分类号: G10L15/32 G10L15/30 G10L15/22

    摘要: Recognition results are acquired with high responsiveness without being affected by a network communication state. A speech recognition control device (1) acquires recognition results from a speech recognition device (2) with which it communicates through a network (3) and a speech recognition unit (13). A communication state measuring unit (11) measures a communication state of the network (3). A speech recognition requesting unit (12) transmits a request for a speech recognition process to each of the speech recognition device (2) and the speech recognition unit (13) with a timeout time set in accordance with an immediately prior communication state of the network (3). A recognition result output unit (14) outputs a recognition result based on a recognition result received from one or recognition results received from both of the speech recognition device (2) and the speech recognition unit (13).

    MODEL LEARNING APPARATUS, METHOD AND PROGRAM

    公开(公告)号:US20220230630A1

    公开(公告)日:2022-07-21

    申请号:US17617556

    申请日:2019-06-10

    IPC分类号: G10L15/16 G10L15/02 G10L15/06

    摘要: A model training device includes: a feature amount extraction unit 2 configured to extract a feature amount that corresponds to each of segments into which a first information sequence is divided by a predetermined unit; a second model calculation unit 3 configured to calculate an output probability distribution of second information when the extracted feature amounts are input to a second model; and a model update unit 4 configured to perform at least one of update of the first model based on the output probability distribution of first information calculated by the first model calculation unit and a correct unit number that corresponds to the acoustic feature amounts, and update of the second model based on the output probability distribution of second information calculated by the second model calculation unit and a correct unit number that corresponds to the first information sequence.

    PRE-TRAINING METHOD, PRE-TRAINING DEVICE, AND PRE-TRAINING PROGRAM

    公开(公告)号:US20240071369A1

    公开(公告)日:2024-02-29

    申请号:US18275205

    申请日:2021-02-02

    IPC分类号: G10L15/06 G10L15/16

    CPC分类号: G10L15/063 G10L15/16

    摘要: A pre-training method executed by a training apparatus includes converting an input acoustic feature amount sequence into a corresponding intermediate acoustic feature amount sequence having a first length using a first conversion model to which a conversion model parameter is provided, converting a correct answer symbol sequence to generate a first frame unit symbol sequence having the first length and generating a second frame unit symbol sequence having the first length by delaying the first frame unit symbol sequence by one frame, converting the second frame unit symbol sequence into an intermediate character feature amount sequence having the first length using a second conversion model to which a character feature amount estimation model parameter is provided, and performing label estimation using an estimation model to which an estimation model parameter is provided based on the intermediate acoustic feature amount sequence and the intermediate character feature amount sequence.

    LEARNING APPARATUS, ESTIMATION APPARATUS, METHODS AND PROGRAMS FOR THE SAME

    公开(公告)号:US20240127796A1

    公开(公告)日:2024-04-18

    申请号:US18277552

    申请日:2021-02-18

    IPC分类号: G10L15/06 G10L15/16

    摘要: The present invention estimates intention of an utterance more accurately than the related arts. A learning device learns an estimation model on the basis of learning data including an acoustic signal for learning and a label indicating whether or not the acoustic signal has been uttered to a predetermined target. The learning device includes: a feature synchronization unit configured to obtain a post-synchronization feature by synchronizing an acoustic feature obtained from the acoustic signal for learning with a text feature corresponding to the acoustic signal; an utterance intention estimation unit configured to estimate whether or not the acoustic signal has been uttered to the predetermined target by using the post-synchronization feature; and a parameter update unit configured to update a parameter of the estimation model on the basis of the label included in the learning data and an estimation result by the utterance intention estimation unit.

    IDENTIFICATION MODEL LEARNING DEVICE, IDENTIFICATION DEVICE, IDENTIFICATION MODEL LEARNING METHOD, IDENTIFICATION METHOD, AND PROGRAM

    公开(公告)号:US20220246137A1

    公开(公告)日:2022-08-04

    申请号:US17617264

    申请日:2019-06-10

    摘要: An identification model learning device capable of improving an identification model for a particular speech vocal sound is provided. An identification model learning device includes: an identification model learning unit configured to learn, based on learning data including a feature sequence in a frame unit of a speech and a binary label indicating whether the speech is a particular speech, an identification model including an input layer that accepts the feature sequence in the frame unit as an input and outputs an output result to an intermediate layer, one or more intermediate layers that accept an output result of the input layer or an immediately previous intermediate layer as an input and output a processing result, an integration layer that accepts an output result of a final intermediate layer as an input and outputs a processing result in a speech unit, and an output layer that outputs the label from the output of the integration layer.