专利检索 cpc:"G10L15/187" 第 1 页

1.

发明授权
Voice visualization system for english learning, and method therefor 有权

公开(公告)号：US12118898B2

公开(公告)日：2024-10-15

申请号：US18260606

申请日：2022-01-27

申请人： Gi Hun Lee

发明人： Gi Hun Lee

IPC分类号： G09B5/02 , G10L15/04 , G10L15/187 , G10L21/14 , G10L25/21

CPC分类号： G09B5/02 , G10L15/04 , G10L15/187 , G10L21/14 , G10L25/21

摘要： A speech visualization system according to the present invention includes: a speech signal input unit for receiving speech signals of sentences with English pronunciations; a speech information analysis unit for analyzing speech information with frequencies, energy, and time of the speech signals and the text corresponding to the speech signals to divide the speech information into at least one or more segments; a speech information classification unit for classifying the segments of the speech information into flow units and each flow unit into at least one or more sub flow units each having at least one or more words; a visualization property assignment unit for assigning visualization properties for speech visualization to the analyzed and classified speech information; and a visualization processing unit for performing visualization processing based on the assigned visualization properties to generate speech visualization data.

2.

发明授权
Speech recognition method and apparatus with cascaded hidden layers and speech segments, computer device, and computer-readable storage medium 有权

公开(公告)号：US12112743B2

公开(公告)日：2024-10-08

申请号：US17709011

申请日：2022-03-30

申请人： TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

发明人： Xilin Zhang , Bo Liu

IPC分类号： G10L15/187 , G10L15/02 , G10L15/04 , G10L15/22 , G10L25/18

CPC分类号： G10L15/187 , G10L15/02 , G10L15/04 , G10L15/22 , G10L25/18 , G10L2015/025

摘要： A speech recognition method includes: obtaining speech data; performing feature extraction on speech data, to obtain speech features of at least two speech segments; inputting the speech features of the at least two speech segments into the speech recognition model, and processing the speech features of the speech segments by using cascaded hidden layers in the speech recognition model, to obtain hidden layer features of the speech segments, a hidden layer feature of an ith speech segment being determined based on speech features of n speech segments located after the ith speech segment in a time sequence and a speech feature of the ith speech segment; and obtaining text information corresponding to the speech data based on the hidden layer features of the speech segments.

3.

发明授权
Automatic speech sensitivity adjustment feature 有权

公开(公告)号：US12106751B2

公开(公告)日：2024-10-01

申请号：US16555845

申请日：2019-08-29

申请人： Microsoft Technology Licensing, LLC

发明人： Michael Tholfsen , Paul Ronald Ray , Daniel Edward McAllister , Hernán David Maestre Piedrahita

IPC分类号： G10L15/187 , G10L15/22 , G10L15/30

CPC分类号： G10L15/187 , G10L15/22 , G10L15/30

摘要： An automatic speech sensitivity adjustment feature is provided. The described sensitivity feature can enable an automatic system adjustment of a sensitivity level based on the number and type of determined speech errors. The sensitivity level determines how sensitive the sensitivity feature will be when indicating speech errors. The sensitivity feature can receive audio input comprising one or more spoken words and determine speech errors for the audio input using at least a sensitivity level. The sensitivity feature can determine whether an amount and type of the speech errors requires an adjustment to the sensitivity level. The sensitivity feature can adjust the sensitivity level to a second sensitivity level based on the amount and type of the speech errors, where the second sensitivity level is a different level than the sensitivity level. The sensitivity feature can re-determine the speech errors for the audio input using at least the second sensitivity level.

4.

发明授权
Fast emit low-latency streaming ASR with sequence-level emission regularization utilizing forward and backward probabilities between nodes of an alignment lattice 有权

公开(公告)号：US12094453B2

公开(公告)日：2024-09-17

申请号：US17447285

申请日：2021-09-09

申请人： Google LLC

发明人： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC分类号： G10L15/06 , G10L15/16 , G10L15/187 , G10L15/22 , G10L15/30

CPC分类号： G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30 , G10L15/187

摘要： A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

5.

发明授权
Methods and systems for confusion reduction for compressed acoustic models 有权

公开(公告)号：US12067978B2

公开(公告)日：2024-08-20

申请号：US17335663

申请日：2021-06-01

申请人： Samsung Electronics Co., Ltd.

发明人： Fuliang Weng , Alexei Ivanov , Stephen Cradock

IPC分类号： G10L15/065 , G06F40/237 , G10L15/187 , G10L15/22

CPC分类号： G10L15/187 , G06F40/237 , G10L15/22

摘要： Methods and systems are disclosed herein for improvements relating to compressed automatic speech recognition (ASR) systems. The ASR system may comprise a compressed acoustic engine and an adaptive decoder. The adaptive decoder may be dynamically compiled based on characteristics of the compressed acoustic engine and a current state of the application device. In some embodiments, a dynamic command list is used to manage context-specific commands. Two or more commands recognized by the adaptive decoder may be confusable due to compression of the ASR system. Alternate commands may be determined that are semantically equivalent but phonetically different than the confusable commands to reduce classification error of the adaptive decoder. An alternate command may replace one or more of the confusable commands in the adaptive decoder. In some embodiments, a user interface is displayed to a user of the ASR system to select the alternate command for replacement in the decoder.

6.

发明授权
Real-time name mispronunciation detection 有权

公开(公告)号：US12020683B2

公开(公告)日：2024-06-25

申请号：US17513335

申请日：2021-10-28

申请人： Microsoft Technology Licensing, LLC

发明人： Tapan Bohra , Akshay Mallipeddi , Amit Srivastava , Ana Karen Parra

IPC分类号： G10L13/08 , G10L13/04 , G10L15/08 , G10L15/187 , G10L15/28

CPC分类号： G10L13/08 , G10L13/04 , G10L15/083 , G10L15/187 , G10L15/285

摘要： A real-time name mispronunciation detection feature can enable a user to receive instant feedback anytime they have mispronounced another person's name in an online meeting. The feature can receive audio input of a speaker and obtain a transcript of the audio input; identify a name from text of the transcript based on names of meeting participants; and extract a portion of the audio input corresponding to the name identified from the text of the transcript. The feature can obtain a reference pronunciation for the name using a user identifier associated with the name; and can obtain a pronunciation score for the name based on a comparison between the reference pronunciation for the name and the portion of the audio input corresponding to the name. The feature can then determine whether the pronunciation score is below a threshold; and in response, notify the speaker of a pronunciation error.

7.

发明授权
Execution engine for compositional entity resolution for assistant systems 有权

公开(公告)号：US12008802B2

公开(公告)日：2024-06-11

申请号：US17362676

申请日：2021-06-29

申请人： Meta Platforms, Inc.

发明人： Vivek Natarajan , Baiyang Liu , Shubham Gupta , Krishna Mittal , Scott Martin

IPC分类号： G10L17/22 , G06F3/01 , G06F3/16 , G06F7/14 , G06F9/451 , G06F16/176 , G06F16/22 , G06F16/23 , G06F16/242 , G06F16/2455 , G06F16/2457 , G06F16/248 , G06F16/33 , G06F16/332 , G06F16/338 , G06F16/903 , G06F16/9032 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/00 , G06V10/764 , G06V10/82 , G06V20/10 , G06V40/20 , G10L15/02 , G10L15/06 , G10L15/07 , G10L15/16 , G10L15/18 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , H04L12/28 , H04L41/00 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/50 , H04L67/5651 , H04L67/75 , H04W12/08 , G10L13/00 , G10L13/04 , H04L51/046 , H04L67/10 , H04L67/53

CPC分类号： G06V10/82 , G06F3/011 , G06F3/013 , G06F3/017 , G06F3/167 , G06F7/14 , G06F9/453 , G06F16/176 , G06F16/2255 , G06F16/2365 , G06F16/243 , G06F16/24552 , G06F16/24575 , G06F16/24578 , G06F16/248 , G06F16/3323 , G06F16/3329 , G06F16/3344 , G06F16/338 , G06F16/90332 , G06F16/90335 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/01 , G06V10/764 , G06V20/10 , G06V40/28 , G10L15/02 , G10L15/063 , G10L15/07 , G10L15/16 , G10L15/1815 , G10L15/1822 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L12/2816 , H04L41/20 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/535 , H04L67/5651 , H04L67/75 , H04W12/08 , G06F2216/13 , G10L13/00 , G10L13/04 , G10L2015/223 , G10L2015/225 , H04L51/046 , H04L67/10 , H04L67/53

摘要： In one embodiment, a method includes receiving, from a client system of a user, a user input comprising a plurality of n-grams, parsing the user input to identify one or more overall intents, hidden intents, and slots associated with the one or more n-grams, wherein at least one of the hidden intents is non-resolvable for being associated with partial slot information corresponding to an n-gram that has not been resolved to a particular entity identifier, wherein the partial slot information is associated with two more entity identifiers of two or more entities, respectively, sending, to the client system, instructions for prompting the user to select one of the entities to be associated with the non-resolvable hidden intent, resolving the non-resolvable hidden intent based on the entity identifier of the entity selected by the user, and generating a response to the user input based on the resolved hidden intent.

8.

发明授权
Electronic apparatus and controlling method thereof 有权

公开(公告)号：US11984122B2

公开(公告)日：2024-05-14

申请号：US17425560

申请日：2021-06-18

申请人： SAMSUNG ELECTRONICS CO., LTD.

发明人： Youngho Han , Sangyoon Kim , Aahwan Kudumula , Kyungmin Lee , Donguk Jung , Changwoo Han

IPC分类号： G10L15/08 , G06F21/31 , G06V30/19 , G06V30/262 , G10L15/18 , G10L15/22 , G10L15/26 , G10L15/187 , G10L15/19

CPC分类号： G10L15/22 , G06F21/31 , G06V30/19093 , G06V30/262 , G10L15/1815 , G10L15/26 , G10L15/187 , G10L15/19 , G10L2015/221 , G10L2015/223

摘要： Disclosed is a method of controlling an electronic apparatus. The method of controlling an electronic apparatus includes: displaying a screen including an input area configured to receive a text, receiving a speech and obtaining a text corresponding to the speech, performing a service operation corresponding to the input area by inputting the obtained text to the input area, and based on a result of performing the service operation, obtaining a plurality of similar texts including a similar pronunciation with the obtained text, and repeatedly performing the service operation by sequentially inputting the plurality of obtained similar texts to the input area.

9.

发明公开
SYSTEMS AND METHODS FOR RECONSTRUCTING VOICE PACKETS USING NATURAL LANGUAGE GENERATION DURING SIGNAL LOSS 审中-公开

公开(公告)号：US20240127790A1

公开(公告)日：2024-04-18

申请号：US18045893

申请日：2022-10-12

申请人： Verizon Patent and Licensing Inc.

发明人： Saurabh TAHILIANI , Subham BISWAS

IPC分类号： G10L13/08 , G06F40/30 , G10L13/047 , G10L13/07 , G10L15/16 , G10L15/187 , G10L15/22 , G10L19/005 , G10L25/18

CPC分类号： G10L13/08 , G06F40/30 , G10L13/047 , G10L13/07 , G10L15/16 , G10L15/187 , G10L15/22 , G10L19/005 , G10L25/18

摘要： A device may receive and convert audio data to text data in real-time, and may detect a network fluctuation that causes missing voice packets. The device may process partial text and context of the text data, with a model, to generate a new phrase, and may generate a response phoneme for the new phrase. The device may utilize a text embedding model to generate a text embedding for the response phoneme, and may process the audio data, with the model, to generate a target voice sequence. The device may utilize an audio embedding model to generate an audio embedding for the target voice sequence, and may combine the text embedding and the audio embedding to generate an embedding input vector. The device may process the embedding input vector, with an audio synthesis model, to generate a final voice response, and may provide the audio data and the final voice response.

10.

发明授权
Systems and methods for correcting automatic speech recognition errors 有权

公开(公告)号：US11922926B2

公开(公告)日：2024-03-05

申请号：US17474080

申请日：2021-09-14

申请人： Capital One Services, LLC

发明人： Aysu Ezen Can , Feng Qiu , Guadalupe Bonilla , Meredith Leigh Critzer , Michael Mossoba , Alexander Lin , Tyler Maiman , Mia Rodriguez , Vahid Khanagha , Joshua Edwards

IPC分类号： G10L15/01 , G06F16/35 , G06N5/022 , G10L15/02 , G10L15/10 , G10L15/187 , G10L15/22

CPC分类号： G10L15/01 , G06F16/35 , G06N5/022 , G10L15/02 , G10L15/10 , G10L15/187 , G10L15/22 , G10L2015/025

摘要： A system may include processor(s), and memory in communication with the processor(s) and storing instructions configured to cause the system to correct ASR errors. The system may receive a transcription comprising transcribed word(s) and may determine whether the transcribed word(s) exceed associated predefined confidence level(s). Responsive to determining a transcribed word does not exceed a predefined confidence level, the system may generate a predicted word. The system may calculate a distance between numerical representations of the transcribed word and the predicted word and may determine whether the distance exceeds a predefined threshold. Responsive to determining the distance exceeds the predefined threshold, the system may determine whether at least one red flag word of a list of red flag words corresponds to a context of the transcription, and, responsive to making that determination, may classify the transcription as associated with a first category.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类