Patent search ipc:"G10L15/04" Page 1

1.

发明申请
CROSS-MODAL TRAINING OF A MACHINE-LEARNING MODEL THAT IDENTIFIES ABUSE IN AUDIO STREAMS 有权

公开(公告)号：US20250069596A1

公开(公告)日：2025-02-27

申请号：US18811550

申请日：2024-08-21

Applicant: Roblox Corporation

Inventor： Mahesh Kumar NANDWANA , Joseph LIU , Morgan Samuel MCGUIRE , Kiran BHAT

IPC: G10L15/183 , G10L13/02 , G10L15/04 , G10L15/06 , G10L15/30 , G10L21/0216 , G10L21/028 , G10L25/84

Abstract: A metaverse application receives a user-provided audio stream associated with a user. The metaverse application obtains portions of one or more audio streams. The metaverse application divides the user-provided audio stream into a plurality of portions, wherein each portion corresponds to a particular time window of the audio stream. The metaverse application providing the plurality of portions of the user-provided audio stream as input to an audio machine-learning model. The audio machine-learning model outputs, based on the portions of the user-provided audio stream, a determination of abuse in a particular portion of the plurality of portions. The metaverse application performs a remedial action responsive to the determination of abuse in the particular portion.

2.

发明授权
Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof 有权

公开(公告)号：US12219154B2

公开(公告)日：2025-02-04

申请号：US18060351

申请日：2022-11-30

Applicant: VOXSMART LIMITED

Inventor： Tejas Shastry , Matthew Goldey , Svyat Vergun

IPC: G10L15/04 , G10L15/00 , G10L15/01 , G10L15/06 , G10L15/14 , G10L15/16 , G10L15/22 , G10L15/26 , G10L15/32 , G10L17/00 , G10L25/78 , H04N19/159 , H04N19/172 , H04N19/184 , H04N19/187 , H04N19/30 , H04N19/70 , G10L25/51

Abstract: In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.

3.

发明申请
MODEL TRAINING METHOD AND APPARATUS, ELECTRONIC DEVICE AND COMPUTER READABLE MEDIUM 有权

公开(公告)号：US20250022456A1

公开(公告)日：2025-01-16

申请号：US18897849

申请日：2024-09-26

Applicant: MaShang Consumer Finance Co., Ltd.

Inventor： Qinglin MENG

IPC: G10L15/06 , G10L15/02 , G10L15/04

Abstract: The present disclosure provides a model training method, including: performing feature extraction from a speech sample to obtain a speech feature; inputting the speech feature into an encoding network of a to-be-trained model for encoding processing; decoding an intermediate encoding feature to obtain an additional loss; obtaining an encoding loss based on an encoding feature and an encoding label; obtaining a total encoding loss based on the additional loss, the encoding loss, and a preset first loss weight; inputting the encoding feature into a decoding network for decoding processing to obtain a total decoding loss; obtaining a total model loss based on the total encoding loss, the total decoding loss, and a preset second loss weight; updating parameters in the model based on the total model loss, and continuing to train the to-be-trained model according to the updated parameters until the total model loss converges, obtaining a trained model.

4.

发明授权
Differentiable wavetable synthesizer using plurality of machine learning models to reduce computational complexity of audio synthesis 有权

公开(公告)号：US12198673B2

公开(公告)日：2025-01-14

申请号：US17525814

申请日：2021-11-12

Applicant: LEMON INC.

Inventor： Lamtharn Hantrakul , Siyuan Shan , Jitong Chen , Matthew David Avent , David Trevelyan

IPC: G10L13/04 , G06N20/00 , G10H7/10 , G10L13/047 , G10L13/08 , G10L15/04 , G10L19/26

Abstract: The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks. Finally, the said wavetables are used to initialize another machine learning model so as to help reduce computational complexity of an audio synthesis obtained as output of the another machine learning model.

5.

发明授权
Keyword detection for audio content 有权

公开(公告)号：US12190867B2

公开(公告)日：2025-01-07

申请号：US17804603

申请日：2022-05-31

Applicant: Microsoft Technology Licensing, LLC

Inventor： Zvi Figov

IPC: G10L15/22 , G06F40/279 , G06F40/40 , G10L15/04 , G10L15/08 , G10L25/57

Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.

6.

发明授权
Information processing apparatus, information processing system, and non-transitory computer readable medium for controlling output of voice segments in accordance with security level 有权

公开(公告)号：US12189794B2

公开(公告)日：2025-01-07

申请号：US17543689

申请日：2021-12-06

Applicant: FUJIFILM Business Innovation Corp.

Inventor： Jun Isozaki , Satoshi Tatsuura

IPC: G06F21/60 , G06F21/62 , G10L15/02 , G10L15/04 , G10L15/22 , G10L15/26

Abstract: An information processing apparatus includes a processor configured to: segment, into multiple voice segments, voice data and text data converted from the voice data; impart a security level to each of the voice segments in accordance with contents of the text data and the voice data in each of the voice segments; and perform control on an output of each of the voice segments in accordance with the security level.

7.

发明授权
Computing device and method for populating digital forms from un-parsed data 有权

公开(公告)号：US12183345B2

公开(公告)日：2024-12-31

申请号：US18328044

申请日：2023-06-02

Applicant: Smart Solutions IP, LLC

Inventor： Fawzi Shaya

IPC: G10L15/26 , G06F16/93 , G06F40/284 , G06N20/00 , G06V30/416 , G10L15/02 , G10L15/04 , G10L15/08

Abstract: A computing device is disclosed which includes a processor and non-transient memory operably connected to the processor. The non-transient memory includes instructions that, when executed by the processor cause the processor to extract a plurality of sub-strings from a character string, analyze each sub-string for compliance with each of several field definitions, where each of the field definitions corresponds to a field in a digital form, and populate some of the fields in the digital form based on the analysis of each sub-string for compliance with the field definitions.

8.

发明申请
HEADSET CONTROL METHOD, HEADSET, APPARATUS, AND STORAGE MEDIUM 有权

公开(公告)号：US20240422468A1

公开(公告)日：2024-12-19

申请号：US18815959

申请日：2024-08-27

Applicant: HUAWEI TECHNOLOGIES CO., LTD.

Inventor： Libin ZHANG , Chang LIU

IPC: H04R1/10 , G10L15/04 , G10L25/51

Abstract: Embodiments of the present disclosure provide a headset control method, a headset, a headset control apparatus, and a related storage medium. The method includes: collecting environment information, and determining key sound detection sensitivity based on the environment information; performing key sound detection in the environment information based on the key sound detection sensitivity; and if a key sound exists in the environment information, adjusting the headset to a hear through mode, or playing the key sound. The headset collects the environment information, determines the key sound detection sensitivity based on the environment information, and performs key sound detection in the environment information based on the key sound detection sensitivity. If the key sound exists, the headset is adjusted to the hear through mode, or the key sound is played. In this solution, the key sound detection sensitivity corresponding to the environment information is determined based on the environment information, to perform key sound detection.

9.

发明授权
Automatic detection of neurocognitive impairment based on a speech sample 有权

公开(公告)号：US12161481B2

公开(公告)日：2024-12-10

申请号：US17415418

申请日：2019-12-16

Applicant: Szegedi Tudományegyetem

Inventor： Gábor Gosztolya , Ildikó Hoffmann , János Kálmán , Magdolna Pákáski , László Tóth , Veronika Vincze

IPC: A61B5/00 , G10L15/02 , G10L15/04 , G10L25/66

Abstract: The invention is a method for automatic detection of neurocognitive impairment, comprising, generating, in a segmentation and labelling step (11), a labelled segment series (26) from a speech sample (22) using a speech recognition unit (24); and generating from the labelled segment series (26), in an acoustic parameter calculation step (12), acoustic parameters (30) characterizing the speech sample (22). The method is characterised by determining, in a probability analysis step (14), in a particular temporal division of the speech sample (22), respective probability values (38) corresponding to silent pauses, filled pauses and any types of pauses for respective temporal intervals thereof; calculating, in an additional parameter calculating step (15), a histogram by generating an additional histogram data set (42) from the determined probability values (38) by dividing a probability domain into subdomains and aggregating durations of the temporal intervals corresponding to the probability values falling into the respective subdomains; and generating, in an evaluation step (13), decision information (34) by feeding the acoustic parameters (30) and the additional histogram data set (42) into an evaluation unit (32), the evaluation unit (32) using a machine learning algorithm. The invention is furthermore data processing system, a computer program product and a computer-readable storage medium for carrying out the method.

10.

发明申请
Stable Output Streaming Speech Translation System 有权

公开(公告)号：US20240395240A1

公开(公告)日：2024-11-28

申请号：US18200889

申请日：2023-05-23

Applicant: Microsoft Technology Licensing, LLC

Inventor： Junkun Chen , Jinyu Li , Peidong Wang , Jian Xue

IPC: G10L15/00 , G10L15/04 , G10L15/08

Abstract: A computer implemented method includes receiving speech data representative of speech in a first language The speech data is divided into chunks of speech data, each chunk comprising multiple temporally consecutive frames of acoustic information. Each temporally consecutive chunk of data is processed using beam search on each frame to identify candidate language tokens representing a second language different from the first language. A best candidate language token(s) is selected for each chunk as processed. The selected best candidate language token or tokens for each chunk of data is committed as a prefix for a next temporally consecutive chunk of data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification