-
公开(公告)号:US20250069596A1
公开(公告)日:2025-02-27
申请号:US18811550
申请日:2024-08-21
Applicant: Roblox Corporation
Inventor: Mahesh Kumar NANDWANA , Joseph LIU , Morgan Samuel MCGUIRE , Kiran BHAT
IPC: G10L15/183 , G10L13/02 , G10L15/04 , G10L15/06 , G10L15/30 , G10L21/0216 , G10L21/028 , G10L25/84
Abstract: A metaverse application receives a user-provided audio stream associated with a user. The metaverse application obtains portions of one or more audio streams. The metaverse application divides the user-provided audio stream into a plurality of portions, wherein each portion corresponds to a particular time window of the audio stream. The metaverse application providing the plurality of portions of the user-provided audio stream as input to an audio machine-learning model. The audio machine-learning model outputs, based on the portions of the user-provided audio stream, a determination of abuse in a particular portion of the plurality of portions. The metaverse application performs a remedial action responsive to the determination of abuse in the particular portion.
-
公开(公告)号:US12219154B2
公开(公告)日:2025-02-04
申请号:US18060351
申请日:2022-11-30
Applicant: VOXSMART LIMITED
Inventor: Tejas Shastry , Matthew Goldey , Svyat Vergun
IPC: G10L15/04 , G10L15/00 , G10L15/01 , G10L15/06 , G10L15/14 , G10L15/16 , G10L15/22 , G10L15/26 , G10L15/32 , G10L17/00 , G10L25/78 , H04N19/159 , H04N19/172 , H04N19/184 , H04N19/187 , H04N19/30 , H04N19/70 , G10L25/51
Abstract: In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.
-
公开(公告)号:US20250022456A1
公开(公告)日:2025-01-16
申请号:US18897849
申请日:2024-09-26
Applicant: MaShang Consumer Finance Co., Ltd.
Inventor: Qinglin MENG
Abstract: The present disclosure provides a model training method, including: performing feature extraction from a speech sample to obtain a speech feature; inputting the speech feature into an encoding network of a to-be-trained model for encoding processing; decoding an intermediate encoding feature to obtain an additional loss; obtaining an encoding loss based on an encoding feature and an encoding label; obtaining a total encoding loss based on the additional loss, the encoding loss, and a preset first loss weight; inputting the encoding feature into a decoding network for decoding processing to obtain a total decoding loss; obtaining a total model loss based on the total encoding loss, the total decoding loss, and a preset second loss weight; updating parameters in the model based on the total model loss, and continuing to train the to-be-trained model according to the updated parameters until the total model loss converges, obtaining a trained model.
-
公开(公告)号:US12198673B2
公开(公告)日:2025-01-14
申请号:US17525814
申请日:2021-11-12
Applicant: LEMON INC.
Inventor: Lamtharn Hantrakul , Siyuan Shan , Jitong Chen , Matthew David Avent , David Trevelyan
Abstract: The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks. Finally, the said wavetables are used to initialize another machine learning model so as to help reduce computational complexity of an audio synthesis obtained as output of the another machine learning model.
-
公开(公告)号:US12190867B2
公开(公告)日:2025-01-07
申请号:US17804603
申请日:2022-05-31
Applicant: Microsoft Technology Licensing, LLC
Inventor: Zvi Figov
Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.
-
公开(公告)号:US12189794B2
公开(公告)日:2025-01-07
申请号:US17543689
申请日:2021-12-06
Applicant: FUJIFILM Business Innovation Corp.
Inventor: Jun Isozaki , Satoshi Tatsuura
Abstract: An information processing apparatus includes a processor configured to: segment, into multiple voice segments, voice data and text data converted from the voice data; impart a security level to each of the voice segments in accordance with contents of the text data and the voice data in each of the voice segments; and perform control on an output of each of the voice segments in accordance with the security level.
-
公开(公告)号:US12183345B2
公开(公告)日:2024-12-31
申请号:US18328044
申请日:2023-06-02
Applicant: Smart Solutions IP, LLC
Inventor: Fawzi Shaya
IPC: G10L15/26 , G06F16/93 , G06F40/284 , G06N20/00 , G06V30/416 , G10L15/02 , G10L15/04 , G10L15/08
Abstract: A computing device is disclosed which includes a processor and non-transient memory operably connected to the processor. The non-transient memory includes instructions that, when executed by the processor cause the processor to extract a plurality of sub-strings from a character string, analyze each sub-string for compliance with each of several field definitions, where each of the field definitions corresponds to a field in a digital form, and populate some of the fields in the digital form based on the analysis of each sub-string for compliance with the field definitions.
-
公开(公告)号:US20240422468A1
公开(公告)日:2024-12-19
申请号:US18815959
申请日:2024-08-27
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Libin ZHANG , Chang LIU
Abstract: Embodiments of the present disclosure provide a headset control method, a headset, a headset control apparatus, and a related storage medium. The method includes: collecting environment information, and determining key sound detection sensitivity based on the environment information; performing key sound detection in the environment information based on the key sound detection sensitivity; and if a key sound exists in the environment information, adjusting the headset to a hear through mode, or playing the key sound. The headset collects the environment information, determines the key sound detection sensitivity based on the environment information, and performs key sound detection in the environment information based on the key sound detection sensitivity. If the key sound exists, the headset is adjusted to the hear through mode, or the key sound is played. In this solution, the key sound detection sensitivity corresponding to the environment information is determined based on the environment information, to perform key sound detection.
-
公开(公告)号:US12161481B2
公开(公告)日:2024-12-10
申请号:US17415418
申请日:2019-12-16
Applicant: Szegedi Tudományegyetem
Inventor: Gábor Gosztolya , Ildikó Hoffmann , János Kálmán , Magdolna Pákáski , László Tóth , Veronika Vincze
Abstract: The invention is a method for automatic detection of neurocognitive impairment, comprising, generating, in a segmentation and labelling step (11), a labelled segment series (26) from a speech sample (22) using a speech recognition unit (24); and generating from the labelled segment series (26), in an acoustic parameter calculation step (12), acoustic parameters (30) characterizing the speech sample (22). The method is characterised by determining, in a probability analysis step (14), in a particular temporal division of the speech sample (22), respective probability values (38) corresponding to silent pauses, filled pauses and any types of pauses for respective temporal intervals thereof; calculating, in an additional parameter calculating step (15), a histogram by generating an additional histogram data set (42) from the determined probability values (38) by dividing a probability domain into subdomains and aggregating durations of the temporal intervals corresponding to the probability values falling into the respective subdomains; and generating, in an evaluation step (13), decision information (34) by feeding the acoustic parameters (30) and the additional histogram data set (42) into an evaluation unit (32), the evaluation unit (32) using a machine learning algorithm. The invention is furthermore data processing system, a computer program product and a computer-readable storage medium for carrying out the method.
-
公开(公告)号:US20240395240A1
公开(公告)日:2024-11-28
申请号:US18200889
申请日:2023-05-23
Applicant: Microsoft Technology Licensing, LLC
Inventor: Junkun Chen , Jinyu Li , Peidong Wang , Jian Xue
Abstract: A computer implemented method includes receiving speech data representative of speech in a first language The speech data is divided into chunks of speech data, each chunk comprising multiple temporally consecutive frames of acoustic information. Each temporally consecutive chunk of data is processed using beam search on each frame to identify candidate language tokens representing a second language different from the first language. A best candidate language token(s) is selected for each chunk as processed. The selected best candidate language token or tokens for each chunk of data is committed as a prefix for a next temporally consecutive chunk of data.
-
-
-
-
-
-
-
-
-