Patent search ap:("MICROSOFT TECHNOLOGY LICENSING Page LLC") AND inv:"Xiong Xiao"

1.

发明授权
Speaker recognition/location using neural network 有权

公开(公告)号：US10580414B2

公开(公告)日：2020-03-03

申请号：US16006405

申请日：2018-06-12

Applicant: Microsoft Technology Licensing, LLC

Inventor： Shixiong Zhang , Xiong Xiao

IPC: G10L17/18 , G01S3/803 , G10L25/78

Abstract: Computing devices and methods utilizing a joint speaker location/speaker identification neural network are provided. In one example a computing device receives a multi-channel audio signal of an utterance spoken by a user. Magnitude and phase information features are extracted from the signal and inputted into a joint speaker location/speaker identification neural network that is trained via utterances from a plurality of persons. A user embedding comprising speaker identification characteristics and location characteristics is received from the neural network and compared to a plurality of enrollment embeddings extracted from the plurality of utterances that are each associated with an identity of a corresponding person. Based at least on the comparisons, the user is matched to an identity of one of the persons, and the identity of the person is outputted.

2.

发明授权
Voice identification enrollment 有权

公开(公告)号：US11152006B2

公开(公告)日：2021-10-19

申请号：US16020911

申请日：2018-06-27

Applicant: Microsoft Technology Licensing, LLC

Inventor： Eyal Krupka , Shixiong Zhang , Xiong Xiao

IPC: G10L17/04 , G10L17/22 , G10L17/02 , G10L25/84

Abstract: Examples are disclosed that relate to voice identification enrollment. One example provides a method of voice identification enrollment comprising, during a meeting in which two or more human speakers speak at different times, determining whether one or more conditions of a protocol for sampling meeting audio used to establish human speaker voiceprints are satisfied, and in response to determining that the one or more conditions are satisfied, selecting a sample of meeting audio according to the protocol, the sample representing an utterance made by one of the human speakers. The method further comprises establishing, based at least on the sample, a voiceprint of the human speaker.

3.

发明授权
Low-latency speech separation 有权

公开(公告)号：US10856076B2

公开(公告)日：2020-12-01

申请号：US16376325

申请日：2019-04-05

Applicant: Microsoft Technology Licensing, LLC

Inventor： Zhuo Chen , Changliang Liu , Takuya Yoshioka , Xiong Xiao , Hakan Erdogan , Dimitrios Basile Dimitriadis

IPC: H04R3/00 , G10L25/30 , H04R1/40

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

4.

发明申请
MULTI-CHANNEL SPEECH SEPARATION 审中-公开

公开(公告)号：US20190139563A1

公开(公告)日：2019-05-09

申请号：US15805106

申请日：2017-11-06

Applicant: Microsoft Technology Licensing, LLC

Inventor： Zhuo Chen , Jinyu Li , Xiong Xiao , Takuya Yoshioka , Huaming Wang , Zhenghao Wang , Yifan Gong

IPC: G10L21/0216 , G10L25/30

Abstract: Representative embodiments disclose mechanisms to separate and recognize multiple audio sources (e.g., picking out individual speakers) in an environment where they overlap and interfere with each other. The architecture uses a microphone array to spatially separate out the audio signals. The spatially filtered signals are then input into a plurality of separators, so each signal is input into a corresponding signal. The separators use neural networks to separate out audio sources. The separators typically produce multiple output signals for the single input signals. A post selection processor then assesses the separator outputs to pick the signals with the highest quality output. These signals can be used in a variety of systems such as speech recognition, meeting transcription and enhancement, hearing aids, music information retrieval, speech enhancement and so forth.

5.

发明授权
Low-latency speech separation 有权

公开(公告)号：US11445295B2

公开(公告)日：2022-09-13

申请号：US16950163

申请日：2020-11-17

Applicant: Microsoft Technology Licensing, LLC

Inventor： Zhuo Chen , Changliang Liu , Takuya Yoshioka , Xiong Xiao , Hakan Erdogan , Dimitrios Basile Dimitriadis

IPC: H04R3/00 , G10L25/30 , H04R1/40

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

6.

发明授权
Speaker recognition/location using neural network 有权

公开(公告)号：US11222640B2

公开(公告)日：2022-01-11

申请号：US16802993

申请日：2020-02-27

Applicant: Microsoft Technology Licensing, LLC

Inventor： Shixiong Zhang , Xiong Xiao

IPC: G10L17/18 , G01S3/803 , G10L25/78

Abstract: Computing devices and methods utilizing a joint speaker location/speaker identification neural network are provided. In one example a computing device receives an audio signal of utterances spoken by multiple persons. Magnitude and phase information features are extracted from the signal and inputted into a joint speaker location and speaker identification neural network. The neural network utilizes both the magnitude and phase information features to determine a change in the person speaking. Output comprising the determination of the change is received from the neural network. The output is then used to perform a speaker recognition function, speaker location function, or both.

7.

发明授权
Training and using a transcript generation model on a multi-speaker audio stream 有权

公开(公告)号：US11984127B2

公开(公告)日：2024-05-14

申请号：US17566861

申请日：2021-12-31

Applicant: Microsoft Technology Licensing, LLC

Inventor： Naoyuki Kanda , Takuya Yoshioka , Zhuo Chen , Jinyu Li , Yashesh Gaur , Zhong Meng , Xiaofei Wang , Xiong Xiao

IPC: G10L15/06 , G10L15/26 , G10L17/04

CPC classification number: G10L17/04 , G10L15/06 , G10L15/26

Abstract: The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.

8.

发明授权
Computerized intelligent assistant for conferences 有权

公开(公告)号：US11688399B2

公开(公告)日：2023-06-27

申请号：US17115293

申请日：2020-12-08

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Adi Diamant , Karen Master Ben-Dor , Eyal Krupka , Raz Halaly , Yoni Smolin , Ilya Gurvich , Aviv Hurvitz , Lijuan Qin , Wei Xiong , Shixiong Zhang , Lingfeng Wu , Xiong Xiao , Ido Leichter , Moshe David , Xuedong Huang , Amit Kumar Agarwal

IPC: H04N7/14 , G10L15/26 , H04N7/15 , G10L17/00 , G06V40/16

CPC classification number: G10L15/26 , G06V40/172 , G10L17/00 , H04N7/15

Abstract: A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal. A face recognition machine is operated to recognize a face of a first conference participant in the digital video, and a speech recognition machine is operated to translate the computer-readable audio signal into a first text. An attribution machine attributes the text to the first conference participant. A second computer-readable audio signal is processed similarly, to obtain a second text attributed to a second conference participant. A transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant.

9.

发明授权
Multi-microphone speech separation 有权

公开(公告)号：US10957337B2

公开(公告)日：2021-03-23

申请号：US15991988

申请日：2018-05-29

Applicant: Microsoft Technology Licensing, LLC

Inventor： Zhuo Chen , Hakan Erdogan , Takuya Yoshioka , Fileno A. Alleva , Xiong Xiao

IPC: G10L21/00 , G10L21/0272 , G06N3/08 , G10L17/04 , G10L17/18 , G10L19/022 , G10L21/0208 , H04R3/00

Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification