-
公开(公告)号:US12014728B2
公开(公告)日:2024-06-18
申请号:US16363705
申请日:2019-03-25
发明人: Kshitiz Kumar , Yifan Gong
IPC分类号: G10L15/00 , G06N3/045 , G06N3/08 , G06N20/20 , G10L15/16 , G10L15/02 , G10L15/22 , G10L25/30
摘要: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.
-
公开(公告)号:US11563784B2
公开(公告)日:2023-01-24
申请号:US17345703
申请日:2021-06-11
发明人: Akash Alok Mahajan , Yifan Gong
IPC分类号: H04L65/403 , G10L15/26 , H04L43/0811 , H04L67/01 , H04L67/61
摘要: Systems are provided for managing and coordinating STT/TTS systems and the communications between these systems when they are connected in online meetings and for mitigating connectivity issues that may arise during the online meetings to provide a seamless and reliable meeting experience with either live captions and/or rendered audio. Initially, online meeting communications are transmitted over a lossy connectionless type protocol/channel. Then, in response to detected connectivity problems with one or more systems involved in the online meeting, which can cause jitter or packet loss, for example, an instruction is dynamically generated and processed for causing one or more of the connected systems to transmit and/or process the online meeting content with a more reliable connection/protocol, such as a connection-oriented protocol. Codecs at the systems are used, when needed to convert speech to text with related speech attribute information and to convert text to speech.
-
公开(公告)号:US20220254334A1
公开(公告)日:2022-08-11
申请号:US17539622
申请日:2021-12-01
发明人: Emilian Stoimenov , Khuram Shahid , Guoli Ye , Hosam Adel Khalil , Yifan Gong
IPC分类号: G10L15/07 , G10L15/02 , G10L15/187 , G10L13/033 , G10L15/22 , G10L13/10 , G10L13/00
摘要: Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.
-
公开(公告)号:US11222622B2
公开(公告)日:2022-01-11
申请号:US16522427
申请日:2019-07-25
发明人: Emilian Stoimenov , Khuram Shahid , Guoli Ye , Hosam Adel Khalil , Yifan Gong
IPC分类号: G10L15/07 , G10L15/02 , G10L15/187 , G10L13/033 , G10L15/22 , G10L13/10 , G10L13/00 , G10L15/08
摘要: Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.
-
公开(公告)号:US11107460B2
公开(公告)日:2021-08-31
申请号:US16460027
申请日:2019-07-02
发明人: Zhong Meng , Jinyu Li , Yifan Gong
摘要: Embodiments are associated with a speaker-independent acoustic model capable of classifying senones based on input speech frames and on first parameters of the speaker-independent acoustic model, a speaker-dependent acoustic model capable of classifying senones based on input speech frames and on second parameters of the speaker-dependent acoustic model, and a discriminator capable of receiving data from the speaker-dependent acoustic model and data from the speaker-independent acoustic model and outputting a prediction of whether received data was generated by the speaker-dependent acoustic model based on third parameters. The second parameters are initialized based on the first parameters, the second parameters are trained based on input frames of a target speaker to minimize a senone classification loss associated with the second parameters, a portion of the second parameters are trained based on the input frames of the target speaker to maximize a discrimination loss associated with the discriminator, and the third parameters are trained based on the input frames of the target speaker to minimize the discrimination loss.
-
公开(公告)号:US20180374486A1
公开(公告)日:2018-12-27
申请号:US15631995
申请日:2017-06-23
发明人: Yong Zhao , Jinyu Li , Yifan Gong , Shixiong Zhang , Zhuo Chen
CPC分类号: G10L17/18 , G10L15/16 , G10L17/005 , G10L17/02 , G10L17/04 , G10L17/22 , G10L2015/025
摘要: Improvements in speaker identification and verification are provided via an attention model for speaker recognition and the end-to-end training thereof. A speaker discriminative convolutional neural network (CNN) is used to directly extract frame-level speaker features that are weighted and combined to form an utterance-level speaker recognition vector via the attention model. The CNN and attention model are join-optimized via an end-to-end training algorithm that imitates the speaker recognition process and uses the most-similar utterances from imposters for each speaker.
-
公开(公告)号:US20170140759A1
公开(公告)日:2017-05-18
申请号:US14941058
申请日:2015-11-13
发明人: Kshitiz Kumar , Hosam Khalil , Yifan Gong , Ziad Al-Bawab , Chaojun Liu
CPC分类号: G10L15/32 , G10L15/183 , G10L15/30
摘要: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.
-
公开(公告)号:US20160253989A1
公开(公告)日:2016-09-01
申请号:US14634714
申请日:2015-02-27
发明人: Shiun-Zu Kuo , Thomas Reutter , Yifan Gong , Mark T. Hanson , Ye Tian , Shuangyu Chang , Jon Hamaker , Qi Miao , Yuancheng Tu
CPC分类号: G10L15/01 , G10L15/183
摘要: Techniques and technologies for diagnosing speech recognition errors are described. In an example implementation, a system for diagnosing speech recognition errors may include an error detection module configured to determine that a speech recognition result is least partially erroneous, and a recognition error diagnostics module. The recognition error diagnostics module may be configured to (a) perform a first error analysis of the at least partially erroneous speech recognition result to provide a first error analysis result; (b) perform a second error analysis of the at least partially erroneous speech recognition result to provide a second error analysis result; and (c) determine at least one category of recognition error associated with the at least partially erroneous speech recognition result based on a combination of the first error analysis result and the second error analysis result.
摘要翻译: 描述用于诊断语音识别错误的技术和技术。 在示例实现中,用于诊断语音识别错误的系统可以包括被配置为确定语音识别结果是最小部分错误的错误检测模块,以及识别错误诊断模块。 识别错误诊断模块可以被配置为(a)对所述至少部分错误的语音识别结果执行第一误差分析以提供第一误差分析结果; (b)对所述至少部分错误的语音识别结果进行第二误差分析以提供第二误差分析结果; 以及(c)基于所述第一误差分析结果和所述第二误差分析结果的组合来确定与所述至少部分错误的语音识别结果相关联的至少一类识别错误。
-
公开(公告)号:US12086704B2
公开(公告)日:2024-09-10
申请号:US17518535
申请日:2021-11-03
发明人: Jinyu Li , Liang Lu , Changliang Liu , Yifan Gong
CPC分类号: G06N3/048 , G06F18/217 , G06N3/08 , G06N20/00 , G10L15/063 , G10L15/16
摘要: Representative embodiments disclose machine learning classifiers used in scenarios such as speech recognition, image captioning, machine translation, or other sequence-to-sequence embodiments. The machine learning classifiers have a plurality of time layers, each layer having a time processing block and a depth processing block. The time processing block is a recurrent neural network such as a Long Short Term Memory (LSTM) network. The depth processing blocks can be an LSTM network, a gated Deep Neural Network (DNN) or a maxout DNN. The depth processing blocks account for the hidden states of each time layer and uses summarized layer information for final input signal feature classification. An attention layer can also be used between the top depth processing block and the output layer.
-
公开(公告)号:US11790891B2
公开(公告)日:2023-10-17
申请号:US17539622
申请日:2021-12-01
发明人: Emilian Stoimenov , Khuram Shahid , Guoli Ye , Hosam Adel Khalil , Yifan Gong
IPC分类号: G10L15/07 , G10L15/02 , G10L15/187 , G10L13/033 , G10L15/22 , G10L13/10 , G10L13/00 , G10L15/08
CPC分类号: G10L15/07 , G10L13/00 , G10L13/033 , G10L13/10 , G10L15/02 , G10L15/187 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223
摘要: Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.
-
-
-
-
-
-
-
-
-