专利检索 ipc:"G10L17/04" 第 1 页

1.

发明公开
METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM FOR SPEAKER CHANGE POINT DETECTION 审中-公开

公开(公告)号：US20240331706A1

公开(公告)日：2024-10-03

申请号：US18741427

申请日：2024-06-12

申请人： Beijing Youzhuju Network Technology Co., Ltd.

发明人： Linhao DONG , Zhiyun FAN , Zejun MA

IPC分类号： G10L17/04

CPC分类号： G10L17/04

摘要： A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors of the target voice data; integrating and firing the speaker characterization vectors of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations in the target voice data; and determining the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data. This method can effectively improve the accuracy of the detection result of a speaker change point in target voice data with a type of interaction.

2.

发明授权
General speech enhancement method and apparatus using multi-source auxiliary information 有权

公开(公告)号：US12094484B2

公开(公告)日：2024-09-17

申请号：US18360838

申请日：2023-07-28

申请人： ZHEJIANG LAB

发明人： Jingsong Li , Zhenchuan Zhang , Tianshu Zhou , Yu Tian

IPC分类号： G10L21/0232 , G10L17/02 , G10L17/04 , G10L25/30

CPC分类号： G10L21/0232 , G10L17/02 , G10L17/04 , G10L25/30

摘要： The present disclosure discloses a general speech enhancement method and apparatus using multi-source auxiliary information. The method includes following steps: S1: building a training data set; S2: using the training data set to learn network parameters of a model, and building a speech enhancement model; S3: building a sound source information database in a pre-collection or on-site collection mode; S4: acquiring an input of the speech enhancement model; and S5: taking a noisy original signal as a main input of the speech enhancement model, taking auxiliary sound signals of a target source group and auxiliary sound signals of an interference source group as side inputs of the speech enhancement model for speech enhancement, and obtaining an enhanced speech signal.

3.

发明公开
ELECTRONIC DEVICE, INTELLIGENT SERVER, AND SPEAKER-ADAPTIVE SPEECH RECOGNITION METHOD 审中-公开

公开(公告)号：US20240304190A1

公开(公告)日：2024-09-12

申请号：US18663779

申请日：2024-05-14

申请人： SAMSUNG ELECTRONICS CO., LTD.

发明人： Taewoo LEE , Minseok KWON , Kyungtae KIM , Gajin SONG , Hoseon SHIN , Jungin LEE , Seokyeong JUNG

IPC分类号： G10L17/04 , G10L15/02

CPC分类号： G10L17/04 , G10L15/02

摘要： An electronic device configured to perform speaker verification on a voice input to determine whether the voice input matches a voice of an enrolled speaker, based on determining that the voice input does not match the voice of the enrolled speaker, perform first speech recognition on the voice input based on a first automatic speech recognition (ASR) model, and based on determining that the voice input matches the voice of the enrolled speaker, perform second speech recognition on the voice input based on a sequence summarizing neural network (SSN) and a second ASR model.

4.

发明授权
Audio signal processing method and apparatus, electronic device, and storage medium 有权

公开(公告)号：US12039995B2

公开(公告)日：2024-07-16

申请号：US17667370

申请日：2022-02-08

申请人： Tencent Technology (Shenzhen) Company Limited

发明人： Jun Wang , Wingyip Lam

IPC分类号： G10L21/0308 , G10L13/02 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/22 , G10L21/0208 , G10L21/0232

CPC分类号： G10L21/0308 , G10L13/02 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/22 , G10L2021/02087 , G10L21/0232

摘要： This application discloses an audio signal processing method performed by an electronic device. According to this application, embedding processing is performed on a mixed audio signal by mapping the mixed audio signal to an embedding space, to obtain an embedding feature of the mixed audio signal in the embedding space; and generalized feature extraction is performed on the embedding feature, so that a generalized feature of a target component in the mixed audio signal can be obtained through extraction. The generalized feature of the target component has good generalization capability and expression capability, and can be used for different scenarios. Audio signal processing is performed on the mixed audio signal based on the generalized feature of the target component to obtain information of the audio signal of the target object, thereby improving the robustness and generalization of an audio signal processing process, and improving the accuracy of audio signal processing.

5.

发明公开
VIRTUAL AGENT TRANSPARENT USER AUTHENTICATION 审中-公开

公开(公告)号：US20240232308A1

公开(公告)日：2024-07-11

申请号：US18152671

申请日：2023-01-10

申请人： Verint Americas Inc.

发明人： Ian BEAVER , Vladislav LUZIN

IPC分类号： G06F21/32 , G10L17/04 , G10L17/06 , G10L17/22

CPC分类号： G06F21/32 , G10L17/04 , G10L17/06 , G10L17/22

摘要： Certain aspects of the present disclosure provide techniques for receiving audio data comprising a user voice command; determining a task to be completed by a remote service based on the user voice command; determining that a reference voice print associated with the user is stored in a user account; authenticating the user by determining that a sample voice print based on the user voice command matches the reference voice print associated with the user; storing authentication evidence associated with the task; and providing proof of user authentication to the remote service in order to initiate the task with the remote service.

6.

发明授权
System and method for detecting fraudsters 有权

公开(公告)号：US12020711B2

公开(公告)日：2024-06-25

申请号：US17166525

申请日：2021-02-03

申请人： Nice Ltd.

发明人： Roman Frenkel , Yarden Hazut , Rotem Shuster Radashkevich

IPC分类号： G10L17/18 , G06Q50/26 , G10L15/22 , G10L17/04 , G10L17/08

CPC分类号： G10L17/08 , G06Q50/26 , G10L15/22 , G10L17/04

摘要： A system and method may classify a plurality of interactions, by: obtaining a plurality of voiceprints of the plurality of interactions, wherein each voiceprint of the plurality of voiceprints represents a speaker participating in an interaction of the plurality of interactions; calculating, for each interaction, a plurality of scores, wherein each score of the plurality of scores is indicative of a similarity between the voiceprint of the interaction and one voiceprint of a set of benchmark voiceprints; calculating, for each interaction, statistics of the scores; and determining that a plurality of interactions pertain to a single cluster of interactions based on statistics of the scores of the interactions in the cluster.

7.

发明授权
Voice command system and voice command method 有权

公开(公告)号：US12002467B2

公开(公告)日：2024-06-04

申请号：US18074513

申请日：2022-12-05

申请人： KYOCERA Corporation

发明人： Yumiko Yamamoto

IPC分类号： G10L17/04 , G06F16/432 , G06F21/32 , G06V40/50 , G10L15/07 , G10L15/22

CPC分类号： G10L15/22 , G06F16/433 , G06F21/32 , G06V40/50 , G10L15/07 , G10L2015/223 , G10L2015/227

摘要： A voice command system according to a first disclosure comprises a gateway apparatus having an interface configured to receive a voice command, and a controller configured to perform a registration process of registering a speaker permitted to receive the voice command. The controller is configured to perform an authentication process of rejecting a reception of the voice command when a speaker of the voice command is not registered, and permitting a reception of the voice command when a speaker of the voice command is registered. The controller is configured to perform the authentication process for each voice command.

8.

发明公开
Mobile Terminal And Hub Apparatus For Use In A Video Communication System 审中-公开

公开(公告)号：US20240163397A1

公开(公告)日：2024-05-16

申请号：US18424706

申请日：2024-01-26

申请人： HUDDLE ROOM TECHNOLOGY S.R.L.

发明人： Mario Ferrari

IPC分类号： H04N7/15 , G10L17/04 , G10L17/06 , H04W76/10

CPC分类号： H04N7/15 , G10L17/04 , G10L17/06 , H04W76/10 , H04W88/02

摘要： A hub apparatus (20) is designated to be used in a video communication system comprising the hub apparatus (20) and a plurality of mobile terminals (10a-10d) configured to be wirelessly connectable to the hub apparatus (20). The hub apparatus (20) comprises: a receiving unit (24) configured to receive from each mobile terminal (10) of the plurality of mobile terminals (10a-10d) a video stream, a current speaker indicator to indicate whether the user of the mobile terminal is speaking and an association information which associates the current speaker indicator transmitted by the mobile terminal with the video stream transmitted from such mobile terminal (10), and a generation unit (40) operatively connected to said receiving unit (24) and configured to generate an output video communication stream (6) based on the plurality of video streams received from each mobile terminal (10) of the plurality of mobile terminals (10a-10d), on the plurality of current speaker indicators received from each mobile terminal (10) of the plurality of mobile terminals (10a-10d) and on the plurality of association information received from each mobile terminal (10) of the plurality of mobile terminals (10a-10d).

9.

发明公开
SPEAKER SEPARATION BASED ON REAL-TIME LATENT SPEAKER STATE CHARACTERIZATION 审中-公开

公开(公告)号：US20240153509A1

公开(公告)日：2024-05-09

申请号：US18368459

申请日：2023-09-14

申请人： OTO Systems Inc.

发明人： Valentin Alain Jean Perret , Nándor Kedves , Nicolas Lucien Perony

IPC分类号： G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272

CPC分类号： G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272

摘要： Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.

10.

发明授权
Systems and methods for processing and presenting conversations 有权

公开(公告)号：US11978472B2

公开(公告)日：2024-05-07

申请号：US17210108

申请日：2021-03-23

申请人： Otter.ai, Inc.

发明人： Yun Fu , Simon Lau , Kaisuke Nakajima , Julius Cheng , Gelei Chen , Sam Song Liang , James Mason Altreuter , Kean Kheong Chin , Zhenhao Ge , Hitesh Anand Gupta , Xiaoke Huang , James Francis McAteer , Brian Francis Williams , Tao Xing

IPC分类号： G10L21/00 , G06F16/438 , G10L17/02 , G10L17/04 , G10L17/22 , G10L21/10 , H04L9/40

CPC分类号： G10L21/10 , G06F16/438 , G10L17/02 , G10L17/04 , G10L17/22 , H04L63/104

摘要： A system for processing and presenting a conversation includes a sensor, a processor, and a presenter. The sensor is configured to capture an audio-form conversation. The processor is configured to automatically transform the audio-form conversation into a transformed conversation. The transformed conversation includes a synchronized text, wherein the synchronized text is synchronized with the audio-form conversation. The presenter is configured to present the transformed conversation including the synchronized text and the audio-form conversation. The presenter is further configured to present the transformed conversation to be navigable, searchable, assignable, editable, and shareable.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类