-
公开(公告)号:US20230050795A1
公开(公告)日:2023-02-16
申请号:US17793000
申请日:2020-01-16
发明人: Takafumi MORIYA , Yusuke SHINOHARA
IPC分类号: G10L15/197 , G10L15/16 , G10L15/22 , G10L15/02 , G10L15/00
摘要: A score integration unit 7 obtains a new score Score (l1:nb, c) that integrates a score Score (l1:nb, c) and a score Score (w1:ob, c). This new score Score (l1:nb, c) becomes a score Score (l1:nb) in a hypothesis selection unit 8. Thus, the score Score (l1:nb) can be said to take into account the score Score (w1:ob, c). In a speech recognition apparatus, first information is extracted on the basis of the score Score (l1:nb) taking into account the score Score (w1:ob, c). Thus, speech recognition with higher performance than that in the related art can be achieved.
-
公开(公告)号:US20220122626A1
公开(公告)日:2022-04-21
申请号:US17428274
申请日:2020-01-23
发明人: Kiyoaki MATSUI , Takafumi MORIYA , Takaaki FUKUTOMI , Yusuke SHINOHARA , Yoshikazu YAMAGUCHI , Manabu OKAMOTO
摘要: Provided is a technology of learning an acoustic model with a certain degree of accuracy of sound recognition within a short calculation period. An acoustic model learning device includes: a loss calculation unit configured to calculate a loss of sound data which is an element of the corpus Cj for learning by using an acoustic model; a curriculum corpus generation unit configured to generate a curriculum corpus being a union of subsets of the corpuses Cj for learning, the corpuses Cj including, as elements, sound data for which the loss falls within a predetermined range indicating a small value; an acoustic model update unit configured to update the acoustic model by using the curriculum corpus; and a first end condition determination unit configured to output the acoustic model when a predetermined end condition is satisfied, or transfer execution control to the loss calculation unit when the predetermined end condition is not satisfied, and the acoustic model update unit is configured to update the acoustic model by giving a weight to a gradient for sound data which is an element of the curriculum corpus using such a weight for sound data as to have a smaller value as a number of times the sound data has been selected as an element of the curriculum corpus becomes larger.
-
公开(公告)号:US20230009370A1
公开(公告)日:2023-01-12
申请号:US17783230
申请日:2019-12-09
发明人: Takafumi MORIYA , Yusuke SHINOHARA
摘要: A probability matrix P is obtained on the basis of an acoustic feature amount sequence, the probability matrix P being the sum for all symbols cn of the product of an output probability distribution vector zn having an element corresponding to the appearance probability of each entry k of the n-th symbol cn for the acoustic feature amount sequence and an attention weight vector αn having an element corresponding to an attention weight representing the degree of relevance of each frame t of the acoustic feature amount sequence with respect to a timing at which the symbol cn appears; a label sequence corresponding to the acoustic feature amount sequence in a case where a model parameter is provided is obtained; a CTC loss of the label sequence for a symbol sequence corresponding to the acoustic feature amount sequence is obtained using the symbol sequence and the label sequence; a KLD loss of the label sequence for a matrix corresponding to the probability matrix P is obtained using the matrix corresponding to the probability matrix P and the label sequence; and the model parameter is updated on the basis of an integrated loss obtained by integrating the CTC loss and the KLD loss, and the processing is repeated until an end condition is satisfied.
-
公开(公告)号:US20220328047A1
公开(公告)日:2022-10-13
申请号:US17615812
申请日:2019-06-04
摘要: Recognition results are acquired with high responsiveness without being affected by a network communication state. A speech recognition control device (1) acquires recognition results from a speech recognition device (2) with which it communicates through a network (3) and a speech recognition unit (13). A communication state measuring unit (11) measures a communication state of the network (3). A speech recognition requesting unit (12) transmits a request for a speech recognition process to each of the speech recognition device (2) and the speech recognition unit (13) with a timeout time set in accordance with an immediately prior communication state of the network (3). A recognition result output unit (14) outputs a recognition result based on a recognition result received from one or recognition results received from both of the speech recognition device (2) and the speech recognition unit (13).
-
公开(公告)号:US20220230630A1
公开(公告)日:2022-07-21
申请号:US17617556
申请日:2019-06-10
摘要: A model training device includes: a feature amount extraction unit 2 configured to extract a feature amount that corresponds to each of segments into which a first information sequence is divided by a predetermined unit; a second model calculation unit 3 configured to calculate an output probability distribution of second information when the extracted feature amounts are input to a second model; and a model update unit 4 configured to perform at least one of update of the first model based on the output probability distribution of first information calculated by the first model calculation unit and a correct unit number that corresponds to the acoustic feature amounts, and update of the second model based on the output probability distribution of second information calculated by the second model calculation unit and a correct unit number that corresponds to the first information sequence.
-
公开(公告)号:US20240071369A1
公开(公告)日:2024-02-29
申请号:US18275205
申请日:2021-02-02
CPC分类号: G10L15/063 , G10L15/16
摘要: A pre-training method executed by a training apparatus includes converting an input acoustic feature amount sequence into a corresponding intermediate acoustic feature amount sequence having a first length using a first conversion model to which a conversion model parameter is provided, converting a correct answer symbol sequence to generate a first frame unit symbol sequence having the first length and generating a second frame unit symbol sequence having the first length by delaying the first frame unit symbol sequence by one frame, converting the second frame unit symbol sequence into an intermediate character feature amount sequence having the first length using a second conversion model to which a character feature amount estimation model parameter is provided, and performing label estimation using an estimation model to which an estimation model parameter is provided based on the intermediate acoustic feature amount sequence and the intermediate character feature amount sequence.
-
公开(公告)号:US20220335927A1
公开(公告)日:2022-10-20
申请号:US17640423
申请日:2019-09-06
摘要: A learning device includes a learning unit learning, with a first feature value having a first feature and given a first value label, a second feature value having a second feature and given a second value label and a third feature value having a feature between the first feature and the second feature and given a value label having a value between the first value label and the second value label as teacher data, a model for estimating which of the first feature and the second feature an input feature value sequence has.
-
公开(公告)号:US20240127796A1
公开(公告)日:2024-04-18
申请号:US18277552
申请日:2021-02-18
发明人: Hiroshi SATO , Takaaki FUKUTOMI , Yusuke SHINOHARA
CPC分类号: G10L15/063 , G10L15/16 , G10L2015/0635
摘要: The present invention estimates intention of an utterance more accurately than the related arts. A learning device learns an estimation model on the basis of learning data including an acoustic signal for learning and a label indicating whether or not the acoustic signal has been uttered to a predetermined target. The learning device includes: a feature synchronization unit configured to obtain a post-synchronization feature by synchronizing an acoustic feature obtained from the acoustic signal for learning with a text feature corresponding to the acoustic signal; an utterance intention estimation unit configured to estimate whether or not the acoustic signal has been uttered to the predetermined target by using the post-synchronization feature; and a parameter update unit configured to update a parameter of the estimation model on the basis of the label included in the learning data and an estimation result by the utterance intention estimation unit.
-
公开(公告)号:US20220246137A1
公开(公告)日:2022-08-04
申请号:US17617264
申请日:2019-06-10
摘要: An identification model learning device capable of improving an identification model for a particular speech vocal sound is provided. An identification model learning device includes: an identification model learning unit configured to learn, based on learning data including a feature sequence in a frame unit of a speech and a binary label indicating whether the speech is a particular speech, an identification model including an input layer that accepts the feature sequence in the frame unit as an input and outputs an output result to an intermediate layer, one or more intermediate layers that accept an output result of the input layer or an immediately previous intermediate layer as an input and output a processing result, an integration layer that accepts an output result of a final intermediate layer as an input and outputs a processing result in a speech unit, and an output layer that outputs the label from the output of the integration layer.
-
10.
公开(公告)号:US20220004868A1
公开(公告)日:2022-01-06
申请号:US17288848
申请日:2019-10-25
摘要: An acoustic model learning apparatus includes a parameter updating part configured to update a parameter of a second acoustic model on the basis of a first loss for a feature amount for training, based on output probability distribution of the second acoustic model which is a neural network acoustic model to be trained, and a second loss for a feature amount for training, based on an intermediate feature amount of a first acoustic model which is a trained neural network acoustic model and an intermediate feature amount of the second acoustic model.
-
-
-
-
-
-
-
-
-