专利检索 ap:("Microsoft Technology Licensing, LLC") AND inv:"Yifan Gong" 第 1 页

1.

发明授权
Dynamic combination of acoustic model states 有权

公开(公告)号：US12014728B2

公开(公告)日：2024-06-18

申请号：US16363705

申请日：2019-03-25

申请人： Microsoft Technology Licensing, LLC

发明人： Kshitiz Kumar , Yifan Gong

IPC分类号： G10L15/00 , G06N3/045 , G06N3/08 , G06N20/20 , G10L15/16 , G10L15/02 , G10L15/22 , G10L25/30

CPC分类号： G10L15/16 , G06N3/045 , G06N3/08 , G06N20/20 , G10L15/02 , G10L15/22 , G10L25/30

摘要： A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.

2.

发明授权
Caption assisted calling to maintain connection in challenging network conditions 有权

公开(公告)号：US11563784B2

公开(公告)日：2023-01-24

申请号：US17345703

申请日：2021-06-11

申请人： MICROSOFT TECHNOLOGY LICENSING, LLC

发明人： Akash Alok Mahajan , Yifan Gong

IPC分类号： H04L65/403 , G10L15/26 , H04L43/0811 , H04L67/01 , H04L67/61

摘要： Systems are provided for managing and coordinating STT/TTS systems and the communications between these systems when they are connected in online meetings and for mitigating connectivity issues that may arise during the online meetings to provide a seamless and reliable meeting experience with either live captions and/or rendered audio. Initially, online meeting communications are transmitted over a lossy connectionless type protocol/channel. Then, in response to detected connectivity problems with one or more systems involved in the online meeting, which can cause jitter or packet loss, for example, an instruction is dynamically generated and processed for causing one or more of the connected systems to transmit and/or process the online meeting content with a more reliable connection/protocol, such as a connection-oriented protocol. Codecs at the systems are used, when needed to convert speech to text with related speech attribute information and to convert text to speech.

3.

发明申请
WAKE WORD SELECTION ASSISTANCE ARCHITECTURES AND METHODS 有权

公开(公告)号：US20220254334A1

公开(公告)日：2022-08-11

申请号：US17539622

申请日：2021-12-01

申请人： Microsoft Technology Licensing, LLC

发明人： Emilian Stoimenov , Khuram Shahid , Guoli Ye , Hosam Adel Khalil , Yifan Gong

IPC分类号： G10L15/07 , G10L15/02 , G10L15/187 , G10L13/033 , G10L15/22 , G10L13/10 , G10L13/00

摘要： Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.

4.

发明授权
Wake word selection assistance architectures and methods 有权

公开(公告)号：US11222622B2

公开(公告)日：2022-01-11

申请号：US16522427

申请日：2019-07-25

申请人： Microsoft Technology Licensing, LLC

发明人： Emilian Stoimenov , Khuram Shahid , Guoli Ye , Hosam Adel Khalil , Yifan Gong

IPC分类号： G10L15/07 , G10L15/02 , G10L15/187 , G10L13/033 , G10L15/22 , G10L13/10 , G10L13/00 , G10L15/08

摘要： Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.

5.

发明授权
Adversarial speaker adaptation 有权

公开(公告)号：US11107460B2

公开(公告)日：2021-08-31

申请号：US16460027

申请日：2019-07-02

申请人： Microsoft Technology Licensing, LLC

发明人： Zhong Meng , Jinyu Li , Yifan Gong

IPC分类号： G10L15/06 , G10L15/02 , G10L15/22 , G10L15/16

摘要： Embodiments are associated with a speaker-independent acoustic model capable of classifying senones based on input speech frames and on first parameters of the speaker-independent acoustic model, a speaker-dependent acoustic model capable of classifying senones based on input speech frames and on second parameters of the speaker-dependent acoustic model, and a discriminator capable of receiving data from the speaker-dependent acoustic model and data from the speaker-independent acoustic model and outputting a prediction of whether received data was generated by the speaker-dependent acoustic model based on third parameters. The second parameters are initialized based on the first parameters, the second parameters are trained based on input frames of a target speaker to minimize a senone classification loss associated with the second parameters, a portion of the second parameters are trained based on the input frames of the target speaker to maximize a discrimination loss associated with the discriminator, and the third parameters are trained based on the input frames of the target speaker to minimize the discrimination loss.

6.

发明申请
SPEAKER RECOGNITION 审中-公开

公开(公告)号：US20180374486A1

公开(公告)日：2018-12-27

申请号：US15631995

申请日：2017-06-23

申请人： Microsoft Technology Licensing, LLC

发明人： Yong Zhao , Jinyu Li , Yifan Gong , Shixiong Zhang , Zhuo Chen

IPC分类号： G10L17/18 , G10L17/00 , G10L17/22 , G10L15/16

CPC分类号： G10L17/18 , G10L15/16 , G10L17/005 , G10L17/02 , G10L17/04 , G10L17/22 , G10L2015/025

摘要： Improvements in speaker identification and verification are provided via an attention model for speaker recognition and the end-to-end training thereof. A speaker discriminative convolutional neural network (CNN) is used to directly extract frame-level speaker features that are weighted and combined to form an utterance-level speaker recognition vector via the attention model. The CNN and attention model are join-optimized via an end-to-end training algorithm that imitates the speaker recognition process and uses the most-similar utterances from imposters for each speaker.

7.

发明申请
CONFIDENCE FEATURES FOR AUTOMATED SPEECH RECOGNITION ARBITRATION 审中-公开

公开(公告)号：US20170140759A1

公开(公告)日：2017-05-18

申请号：US14941058

申请日：2015-11-13

申请人： Microsoft Technology Licensing, LLC

发明人： Kshitiz Kumar , Hosam Khalil , Yifan Gong , Ziad Al-Bawab , Chaojun Liu

IPC分类号： G10L15/32 , G10L15/06 , G10L15/18 , G10L15/30

CPC分类号： G10L15/32 , G10L15/183 , G10L15/30

摘要： The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.

8.

发明申请
SPEECH RECOGNITION ERROR DIAGNOSIS 有权
标题翻译：语音识别错误诊断

公开(公告)号：US20160253989A1

公开(公告)日：2016-09-01

申请号：US14634714

申请日：2015-02-27

申请人： Microsoft Technology Licensing, LLC

发明人： Shiun-Zu Kuo , Thomas Reutter , Yifan Gong , Mark T. Hanson , Ye Tian , Shuangyu Chang , Jon Hamaker , Qi Miao , Yuancheng Tu

IPC分类号： G10L15/01 , G10L15/26 , G10L15/19

CPC分类号： G10L15/01 , G10L15/183

摘要： Techniques and technologies for diagnosing speech recognition errors are described. In an example implementation, a system for diagnosing speech recognition errors may include an error detection module configured to determine that a speech recognition result is least partially erroneous, and a recognition error diagnostics module. The recognition error diagnostics module may be configured to (a) perform a first error analysis of the at least partially erroneous speech recognition result to provide a first error analysis result; (b) perform a second error analysis of the at least partially erroneous speech recognition result to provide a second error analysis result; and (c) determine at least one category of recognition error associated with the at least partially erroneous speech recognition result based on a combination of the first error analysis result and the second error analysis result.

摘要翻译： 描述用于诊断语音识别错误的技术和技术。在示例实现中，用于诊断语音识别错误的系统可以包括被配置为确定语音识别结果是最小部分错误的错误检测模块，以及识别错误诊断模块。识别错误诊断模块可以被配置为（a）对所述至少部分错误的语音识别结果执行第一误差分析以提供第一误差分析结果; （b）对所述至少部分错误的语音识别结果进行第二误差分析以提供第二误差分析结果; 以及（c）基于所述第一误差分析结果和所述第二误差分析结果的组合来确定与所述至少部分错误的语音识别结果相关联的至少一类识别错误。

9.

发明授权
Machine learning model with depth processing units 有权

公开(公告)号：US12086704B2

公开(公告)日：2024-09-10

申请号：US17518535

申请日：2021-11-03

申请人： Microsoft Technology Licensing, LLC

发明人： Jinyu Li , Liang Lu , Changliang Liu , Yifan Gong

IPC分类号： G06N3/08 , G06F18/21 , G06N3/048 , G06N20/00 , G10L15/06 , G10L15/16

CPC分类号： G06N3/048 , G06F18/217 , G06N3/08 , G06N20/00 , G10L15/063 , G10L15/16

摘要： Representative embodiments disclose machine learning classifiers used in scenarios such as speech recognition, image captioning, machine translation, or other sequence-to-sequence embodiments. The machine learning classifiers have a plurality of time layers, each layer having a time processing block and a depth processing block. The time processing block is a recurrent neural network such as a Long Short Term Memory (LSTM) network. The depth processing blocks can be an LSTM network, a gated Deep Neural Network (DNN) or a maxout DNN. The depth processing blocks account for the hidden states of each time layer and uses summarized layer information for final input signal feature classification. An attention layer can also be used between the top depth processing block and the output layer.

10.

发明授权
Wake word selection assistance architectures and methods 有权

公开(公告)号：US11790891B2

公开(公告)日：2023-10-17

申请号：US17539622

申请日：2021-12-01

申请人： Microsoft Technology Licensing, LLC

发明人： Emilian Stoimenov , Khuram Shahid , Guoli Ye , Hosam Adel Khalil , Yifan Gong

IPC分类号： G10L15/07 , G10L15/02 , G10L15/187 , G10L13/033 , G10L15/22 , G10L13/10 , G10L13/00 , G10L15/08

CPC分类号： G10L15/07 , G10L13/00 , G10L13/033 , G10L13/10 , G10L15/02 , G10L15/187 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223

摘要： Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类