Patent search ap:("MICROSOFT TECHNOLOGY LICENSING Page LLC") AND inv:"Shixiong Zhang"

1.

发明授权
Voice identification enrollment 有权

公开(公告)号：US11152006B2

公开(公告)日：2021-10-19

申请号：US16020911

申请日：2018-06-27

Applicant: Microsoft Technology Licensing, LLC

Inventor： Eyal Krupka , Shixiong Zhang , Xiong Xiao

IPC: G10L17/04 , G10L17/22 , G10L17/02 , G10L25/84

Abstract: Examples are disclosed that relate to voice identification enrollment. One example provides a method of voice identification enrollment comprising, during a meeting in which two or more human speakers speak at different times, determining whether one or more conditions of a protocol for sampling meeting audio used to establish human speaker voiceprints are satisfied, and in response to determining that the one or more conditions are satisfied, selecting a sample of meeting audio according to the protocol, the sample representing an utterance made by one of the human speakers. The method further comprises establishing, based at least on the sample, a voiceprint of the human speaker.

2.

发明授权
Speaker recognition 有权

公开(公告)号：US10354656B2

公开(公告)日：2019-07-16

申请号：US15631995

申请日：2017-06-23

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yong Zhao , Jinyu Li , Yifan Gong , Shixiong Zhang , Zhuo Chen

IPC: G10L17/22 , G10L17/18 , G10L17/04 , G10L17/02 , G10L15/16 , G10L15/02 , G10L15/00 , G10L17/00

Abstract: Improvements in speaker identification and verification are provided via an attention model for speaker recognition and the end-to-end training thereof. A speaker discriminative convolutional neural network (CNN) is used to directly extract frame-level speaker features that are weighted and combined to form an utterance-level speaker recognition vector via the attention model. The CNN and attention model are join-optimized via an end-to-end training algorithm that imitates the speaker recognition process and uses the most-similar utterances from imposters for each speaker.

3.

发明授权
Speaker recognition/location using neural network 有权

公开(公告)号：US10580414B2

公开(公告)日：2020-03-03

申请号：US16006405

申请日：2018-06-12

Applicant: Microsoft Technology Licensing, LLC

Inventor： Shixiong Zhang , Xiong Xiao

IPC: G10L17/18 , G01S3/803 , G10L25/78

Abstract: Computing devices and methods utilizing a joint speaker location/speaker identification neural network are provided. In one example a computing device receives a multi-channel audio signal of an utterance spoken by a user. Magnitude and phase information features are extracted from the signal and inputted into a joint speaker location/speaker identification neural network that is trained via utterances from a plurality of persons. A user embedding comprising speaker identification characteristics and location characteristics is received from the neural network and compared to a plurality of enrollment embeddings extracted from the plurality of utterances that are each associated with an identity of a corresponding person. Based at least on the comparisons, the user is matched to an identity of one of the persons, and the identity of the person is outputted.

4.

发明授权
Computerized intelligent assistant for conferences 有权

公开(公告)号：US11688399B2

公开(公告)日：2023-06-27

申请号：US17115293

申请日：2020-12-08

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Adi Diamant , Karen Master Ben-Dor , Eyal Krupka , Raz Halaly , Yoni Smolin , Ilya Gurvich , Aviv Hurvitz , Lijuan Qin , Wei Xiong , Shixiong Zhang , Lingfeng Wu , Xiong Xiao , Ido Leichter , Moshe David , Xuedong Huang , Amit Kumar Agarwal

IPC: H04N7/14 , G10L15/26 , H04N7/15 , G10L17/00 , G06V40/16

CPC classification number: G10L15/26 , G06V40/172 , G10L17/00 , H04N7/15

Abstract: A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal. A face recognition machine is operated to recognize a face of a first conference participant in the digital video, and a speech recognition machine is operated to translate the computer-readable audio signal into a first text. An attribution machine attributes the text to the first conference participant. A second computer-readable audio signal is processed similarly, to obtain a second text attributed to a second conference participant. A transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant.

5.

发明授权
Speaker recognition/location using neural network 有权

公开(公告)号：US11222640B2

公开(公告)日：2022-01-11

申请号：US16802993

申请日：2020-02-27

Applicant: Microsoft Technology Licensing, LLC

Inventor： Shixiong Zhang , Xiong Xiao

IPC: G10L17/18 , G01S3/803 , G10L25/78

Abstract: Computing devices and methods utilizing a joint speaker location/speaker identification neural network are provided. In one example a computing device receives an audio signal of utterances spoken by multiple persons. Magnitude and phase information features are extracted from the signal and inputted into a joint speaker location and speaker identification neural network. The neural network utilizes both the magnitude and phase information features to determine a change in the person speaking. Output comprising the determination of the change is received from the neural network. The output is then used to perform a speaker recognition function, speaker location function, or both.

6.

发明申请
SPEAKER RECOGNITION 审中-公开

公开(公告)号：US20180374486A1

公开(公告)日：2018-12-27

申请号：US15631995

申请日：2017-06-23

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yong Zhao , Jinyu Li , Yifan Gong , Shixiong Zhang , Zhuo Chen

IPC: G10L17/18 , G10L17/00 , G10L17/22 , G10L15/16

CPC classification number: G10L17/18 , G10L15/16 , G10L17/005 , G10L17/02 , G10L17/04 , G10L17/22 , G10L2015/025

Abstract: Improvements in speaker identification and verification are provided via an attention model for speaker recognition and the end-to-end training thereof. A speaker discriminative convolutional neural network (CNN) is used to directly extract frame-level speaker features that are weighted and combined to form an utterance-level speaker recognition vector via the attention model. The CNN and attention model are join-optimized via an end-to-end training algorithm that imitates the speaker recognition process and uses the most-similar utterances from imposters for each speaker.

Patent Agency Ranking