-
公开(公告)号:US12118898B2
公开(公告)日:2024-10-15
申请号:US18260606
申请日:2022-01-27
申请人: Gi Hun Lee
发明人: Gi Hun Lee
IPC分类号: G09B5/02 , G10L15/04 , G10L15/187 , G10L21/14 , G10L25/21
CPC分类号: G09B5/02 , G10L15/04 , G10L15/187 , G10L21/14 , G10L25/21
摘要: A speech visualization system according to the present invention includes: a speech signal input unit for receiving speech signals of sentences with English pronunciations; a speech information analysis unit for analyzing speech information with frequencies, energy, and time of the speech signals and the text corresponding to the speech signals to divide the speech information into at least one or more segments; a speech information classification unit for classifying the segments of the speech information into flow units and each flow unit into at least one or more sub flow units each having at least one or more words; a visualization property assignment unit for assigning visualization properties for speech visualization to the analyzed and classified speech information; and a visualization processing unit for performing visualization processing based on the assigned visualization properties to generate speech visualization data.
-
公开(公告)号:US12112743B2
公开(公告)日:2024-10-08
申请号:US17709011
申请日:2022-03-30
发明人: Xilin Zhang , Bo Liu
IPC分类号: G10L15/187 , G10L15/02 , G10L15/04 , G10L15/22 , G10L25/18
CPC分类号: G10L15/187 , G10L15/02 , G10L15/04 , G10L15/22 , G10L25/18 , G10L2015/025
摘要: A speech recognition method includes: obtaining speech data; performing feature extraction on speech data, to obtain speech features of at least two speech segments; inputting the speech features of the at least two speech segments into the speech recognition model, and processing the speech features of the speech segments by using cascaded hidden layers in the speech recognition model, to obtain hidden layer features of the speech segments, a hidden layer feature of an ith speech segment being determined based on speech features of n speech segments located after the ith speech segment in a time sequence and a speech feature of the ith speech segment; and obtaining text information corresponding to the speech data based on the hidden layer features of the speech segments.
-
公开(公告)号:US12106751B2
公开(公告)日:2024-10-01
申请号:US16555845
申请日:2019-08-29
发明人: Michael Tholfsen , Paul Ronald Ray , Daniel Edward McAllister , Hernán David Maestre Piedrahita
IPC分类号: G10L15/187 , G10L15/22 , G10L15/30
CPC分类号: G10L15/187 , G10L15/22 , G10L15/30
摘要: An automatic speech sensitivity adjustment feature is provided. The described sensitivity feature can enable an automatic system adjustment of a sensitivity level based on the number and type of determined speech errors. The sensitivity level determines how sensitive the sensitivity feature will be when indicating speech errors. The sensitivity feature can receive audio input comprising one or more spoken words and determine speech errors for the audio input using at least a sensitivity level. The sensitivity feature can determine whether an amount and type of the speech errors requires an adjustment to the sensitivity level. The sensitivity feature can adjust the sensitivity level to a second sensitivity level based on the amount and type of the speech errors, where the second sensitivity level is a different level than the sensitivity level. The sensitivity feature can re-determine the speech errors for the audio input using at least the second sensitivity level.
-
公开(公告)号:US12094453B2
公开(公告)日:2024-09-17
申请号:US17447285
申请日:2021-09-09
申请人: Google LLC
发明人: Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang
IPC分类号: G10L15/06 , G10L15/16 , G10L15/187 , G10L15/22 , G10L15/30
CPC分类号: G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30 , G10L15/187
摘要: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.
-
公开(公告)号:US12067978B2
公开(公告)日:2024-08-20
申请号:US17335663
申请日:2021-06-01
发明人: Fuliang Weng , Alexei Ivanov , Stephen Cradock
IPC分类号: G10L15/065 , G06F40/237 , G10L15/187 , G10L15/22
CPC分类号: G10L15/187 , G06F40/237 , G10L15/22
摘要: Methods and systems are disclosed herein for improvements relating to compressed automatic speech recognition (ASR) systems. The ASR system may comprise a compressed acoustic engine and an adaptive decoder. The adaptive decoder may be dynamically compiled based on characteristics of the compressed acoustic engine and a current state of the application device. In some embodiments, a dynamic command list is used to manage context-specific commands. Two or more commands recognized by the adaptive decoder may be confusable due to compression of the ASR system. Alternate commands may be determined that are semantically equivalent but phonetically different than the confusable commands to reduce classification error of the adaptive decoder. An alternate command may replace one or more of the confusable commands in the adaptive decoder. In some embodiments, a user interface is displayed to a user of the ASR system to select the alternate command for replacement in the decoder.
-
公开(公告)号:US12020683B2
公开(公告)日:2024-06-25
申请号:US17513335
申请日:2021-10-28
IPC分类号: G10L13/08 , G10L13/04 , G10L15/08 , G10L15/187 , G10L15/28
CPC分类号: G10L13/08 , G10L13/04 , G10L15/083 , G10L15/187 , G10L15/285
摘要: A real-time name mispronunciation detection feature can enable a user to receive instant feedback anytime they have mispronounced another person's name in an online meeting. The feature can receive audio input of a speaker and obtain a transcript of the audio input; identify a name from text of the transcript based on names of meeting participants; and extract a portion of the audio input corresponding to the name identified from the text of the transcript. The feature can obtain a reference pronunciation for the name using a user identifier associated with the name; and can obtain a pronunciation score for the name based on a comparison between the reference pronunciation for the name and the portion of the audio input corresponding to the name. The feature can then determine whether the pronunciation score is below a threshold; and in response, notify the speaker of a pronunciation error.
-
公开(公告)号:US12008802B2
公开(公告)日:2024-06-11
申请号:US17362676
申请日:2021-06-29
申请人: Meta Platforms, Inc.
发明人: Vivek Natarajan , Baiyang Liu , Shubham Gupta , Krishna Mittal , Scott Martin
IPC分类号: G10L17/22 , G06F3/01 , G06F3/16 , G06F7/14 , G06F9/451 , G06F16/176 , G06F16/22 , G06F16/23 , G06F16/242 , G06F16/2455 , G06F16/2457 , G06F16/248 , G06F16/33 , G06F16/332 , G06F16/338 , G06F16/903 , G06F16/9032 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/00 , G06V10/764 , G06V10/82 , G06V20/10 , G06V40/20 , G10L15/02 , G10L15/06 , G10L15/07 , G10L15/16 , G10L15/18 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , H04L12/28 , H04L41/00 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/50 , H04L67/5651 , H04L67/75 , H04W12/08 , G10L13/00 , G10L13/04 , H04L51/046 , H04L67/10 , H04L67/53
CPC分类号: G06V10/82 , G06F3/011 , G06F3/013 , G06F3/017 , G06F3/167 , G06F7/14 , G06F9/453 , G06F16/176 , G06F16/2255 , G06F16/2365 , G06F16/243 , G06F16/24552 , G06F16/24575 , G06F16/24578 , G06F16/248 , G06F16/3323 , G06F16/3329 , G06F16/3344 , G06F16/338 , G06F16/90332 , G06F16/90335 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/01 , G06V10/764 , G06V20/10 , G06V40/28 , G10L15/02 , G10L15/063 , G10L15/07 , G10L15/16 , G10L15/1815 , G10L15/1822 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L12/2816 , H04L41/20 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/535 , H04L67/5651 , H04L67/75 , H04W12/08 , G06F2216/13 , G10L13/00 , G10L13/04 , G10L2015/223 , G10L2015/225 , H04L51/046 , H04L67/10 , H04L67/53
摘要: In one embodiment, a method includes receiving, from a client system of a user, a user input comprising a plurality of n-grams, parsing the user input to identify one or more overall intents, hidden intents, and slots associated with the one or more n-grams, wherein at least one of the hidden intents is non-resolvable for being associated with partial slot information corresponding to an n-gram that has not been resolved to a particular entity identifier, wherein the partial slot information is associated with two more entity identifiers of two or more entities, respectively, sending, to the client system, instructions for prompting the user to select one of the entities to be associated with the non-resolvable hidden intent, resolving the non-resolvable hidden intent based on the entity identifier of the entity selected by the user, and generating a response to the user input based on the resolved hidden intent.
-
公开(公告)号:US11984122B2
公开(公告)日:2024-05-14
申请号:US17425560
申请日:2021-06-18
发明人: Youngho Han , Sangyoon Kim , Aahwan Kudumula , Kyungmin Lee , Donguk Jung , Changwoo Han
IPC分类号: G10L15/08 , G06F21/31 , G06V30/19 , G06V30/262 , G10L15/18 , G10L15/22 , G10L15/26 , G10L15/187 , G10L15/19
CPC分类号: G10L15/22 , G06F21/31 , G06V30/19093 , G06V30/262 , G10L15/1815 , G10L15/26 , G10L15/187 , G10L15/19 , G10L2015/221 , G10L2015/223
摘要: Disclosed is a method of controlling an electronic apparatus. The method of controlling an electronic apparatus includes: displaying a screen including an input area configured to receive a text, receiving a speech and obtaining a text corresponding to the speech, performing a service operation corresponding to the input area by inputting the obtained text to the input area, and based on a result of performing the service operation, obtaining a plurality of similar texts including a similar pronunciation with the obtained text, and repeatedly performing the service operation by sequentially inputting the plurality of obtained similar texts to the input area.
-
9.
公开(公告)号:US20240127790A1
公开(公告)日:2024-04-18
申请号:US18045893
申请日:2022-10-12
发明人: Saurabh TAHILIANI , Subham BISWAS
IPC分类号: G10L13/08 , G06F40/30 , G10L13/047 , G10L13/07 , G10L15/16 , G10L15/187 , G10L15/22 , G10L19/005 , G10L25/18
CPC分类号: G10L13/08 , G06F40/30 , G10L13/047 , G10L13/07 , G10L15/16 , G10L15/187 , G10L15/22 , G10L19/005 , G10L25/18
摘要: A device may receive and convert audio data to text data in real-time, and may detect a network fluctuation that causes missing voice packets. The device may process partial text and context of the text data, with a model, to generate a new phrase, and may generate a response phoneme for the new phrase. The device may utilize a text embedding model to generate a text embedding for the response phoneme, and may process the audio data, with the model, to generate a target voice sequence. The device may utilize an audio embedding model to generate an audio embedding for the target voice sequence, and may combine the text embedding and the audio embedding to generate an embedding input vector. The device may process the embedding input vector, with an audio synthesis model, to generate a final voice response, and may provide the audio data and the final voice response.
-
公开(公告)号:US11922926B2
公开(公告)日:2024-03-05
申请号:US17474080
申请日:2021-09-14
发明人: Aysu Ezen Can , Feng Qiu , Guadalupe Bonilla , Meredith Leigh Critzer , Michael Mossoba , Alexander Lin , Tyler Maiman , Mia Rodriguez , Vahid Khanagha , Joshua Edwards
CPC分类号: G10L15/01 , G06F16/35 , G06N5/022 , G10L15/02 , G10L15/10 , G10L15/187 , G10L15/22 , G10L2015/025
摘要: A system may include processor(s), and memory in communication with the processor(s) and storing instructions configured to cause the system to correct ASR errors. The system may receive a transcription comprising transcribed word(s) and may determine whether the transcribed word(s) exceed associated predefined confidence level(s). Responsive to determining a transcribed word does not exceed a predefined confidence level, the system may generate a predicted word. The system may calculate a distance between numerical representations of the transcribed word and the predicted word and may determine whether the distance exceeds a predefined threshold. Responsive to determining the distance exceeds the predefined threshold, the system may determine whether at least one red flag word of a list of red flag words corresponds to a context of the transcription, and, responsive to making that determination, may classify the transcription as associated with a first category.
-
-
-
-
-
-
-
-
-