Patent search ipc:"G10L15/04" Page 8

71.

发明授权
Speech signal processing method and speech signal processing apparatus 有权

公开(公告)号：US11308936B2

公开(公告)日：2022-04-19

申请号：US16399211

申请日：2019-04-30

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Tae-yoon Kim , Sang-ha Kim , Sung-Soo Kim , Jin-sik Lee , Chang-woo Han , Eun-kyoung Kim , Jae-won Lee

IPC: G10L15/06 , G10L15/30 , G10L15/04

Abstract: A speech signal processing method of a user terminal includes: receiving a speech signal, detecting a personalized information section including personal information in the speech signal, performing data processing on the personalized information section of the speech signal by using a personalized model generated based on the personal information, and receiving, from a server, a result of the data processing performed by the server on a general information section of the speech signal that is different than the personalized information section of the speech signal.

72.

发明授权
Speech-to-analytics framework with support for large n-gram corpora 有权

公开(公告)号：US11217233B1

公开(公告)日：2022-01-04

申请号：US17370441

申请日：2021-07-08

Applicant: SAS Institute Inc.

Inventor： Xiaozhuo Cheng , Xu Yang , Xiaolong Li , Biljana Belamaric Wilsey , Haipeng Liu , Jared Peterson

IPC: G06N3/02 , G06N7/00 , G10L15/04 , G10L15/16 , G10L15/18 , G10L15/197 , G10L15/22 , G10L15/30

Abstract: An apparatus includes processor(s) to: generate a set of candidate n-grams based on probability distributions from an acoustic model for candidate graphemes of a next word most likely spoken following at least one preceding word spoken within speech audio; provide the set of candidate n-grams to multiple devices; provide, to each node device, an indication of which candidate n-grams are to be searched for within the n-gram corpus by each node device to enable searches for multiple candidate n-grams to be performed, independently and at least partially in parallel, across the node devices; receive, from each node device, an indication of a probability of occurrence of at least one candidate n-gram within the speech audio; based on the received probabilities of occurrence, identify the next word most likely spoken within the speech audio; and add the next word most likely spoken to a transcript of the speech audio.

73.

发明授权
Shift invariant loss for deep learning based image segmentation 有权

公开(公告)号：US11200676B2

公开(公告)日：2021-12-14

申请号：US16746340

申请日：2020-01-17

Applicant: Verily Life Sciences LLC

Inventor： Cheng-Hsun Wu , Ali Behrooz

IPC: G06K9/00 , G06T7/11 , G06F17/15 , G06F17/18 , G06N3/08 , G10L15/04 , G10L15/16

Abstract: Systems and methods of improving alignment in dense prediction neural networks are disclosed. A method includes identifying, at a computing system, an input data set and a label data set with one or more first parts of the input data set corresponding to a label. The computing system processes the input data set using a neural network to generate a predicted label data set that identifies one or more second parts of the input data set predicted to correspond to the label. The computing system determines an alignment result using the predicted label data set and the label data set and a transformation of the one or more first parts, including a shift, rotation, scaling, and/or deformation, based on the alignment result. The computing system computes a loss score using the transformation, label data and the predicted label data set and updates the neural network based on the loss score.

74.

发明授权
Sound recognition system and method 有权

公开(公告)号：US11183204B2

公开(公告)日：2021-11-23

申请号：US16503707

申请日：2019-07-05

Applicant: HON HAI PRECISION INDUSTRY CO., LTD.

Inventor： Jung-Yi Lin

IPC: G10L15/00 , G10L25/51 , G10L15/22 , G10L15/16 , G06K9/00 , G10L19/00 , G10L15/04 , G10L15/30

Abstract: A voice recognition system includes a computing device and at least one mobile terminal communicatively coupled to the computing device through a network. The computing device obtains an original sound from the at least one mobile terminal and converts the original sound into a digitized time-frequency map, performs compression segmentation on the time-frequency map to obtain a sound image corresponding to the time-frequency map, and uses an image recognition method to recognize the sound image, obtain an enhanced sound image, and search a preset database for sound information corresponding to the enhanced sound image.

75.

发明申请
METHOD AND SYSTEM FOR ACOUSTIC MODEL CONDITIONING ON NON-PHONEME INFORMATION FEATURES 有权

公开(公告)号：US20210335340A1

公开(公告)日：2021-10-28

申请号：US17224967

申请日：2021-04-07

Applicant: SoundHound, Inc.

Inventor： Zizu GOWAYYED , Keyvan MOHAJER

IPC: G10L15/02 , G10L15/04 , G10L15/22

Abstract: A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

76.

发明授权
Dynamic model selection in speech-to-text processing 有权

公开(公告)号：US11145309B1

公开(公告)日：2021-10-12

申请号：US17205871

申请日：2021-03-18

Applicant: SAS Institute Inc.

Inventor： Xu Yang

IPC: G10L15/16 , G10L15/22 , G10L15/26 , G10L25/30 , G10L15/04 , G06N3/04 , G10L25/78 , G06N3/08

Abstract: An apparatus includes processor(s) to: use an acoustic model to generate a first set of probabilities of speech sounds uttered within speech audio; derive at least a first candidate word most likely spoken in the speech audio using the first set; analyze the first set to derive a degree of uncertainty therefor; compare the degree of uncertainty to a threshold; in response to at least the degree of uncertainty being less than the threshold, select the first candidate word as a next word most likely spoken in the speech audio; in response to at least the degree of uncertainty being greater than the threshold, select, as the next word most likely spoken in the speech audio, a second candidate word indicated as being most likely spoken based on a second set of probabilities generated by a language model; and add the next word most likely spoken to a transcript.

77.

发明申请
METHOD AND DEVICE FOR PLAYING VOICE, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20210311699A1

公开(公告)日：2021-10-07

申请号：US15733891

申请日：2019-09-04

Applicant: BEIJING DAJIA INTERNET INFORMATION TECHNOLOGY CO., LTD.

Inventor： Yang ZHANG , Meizhuo LI

IPC: G06F3/16 , G10L15/04 , G10L25/57 , G10L15/22 , G10L15/30

Abstract: Embodiments of the present application provide a speech playback method and apparatus, an electronic device and a storage medium. The method specifically comprises: receiving speech data sent by first electronic devices to obtain a speech data set; receiving audio and video data sent by a second electronic device, the audio and video data comprising speech data selected for playback, and the speech data selected for playback comprising any one of the speech data clicked for playback in the speech data set; and pushing the audio and video data to each first electronic device. For a webcast system, audience users using second electronic devices can interact with anchor users by means of speech, so that audience users who input a text slow or who can't input a text can also easily express opinions in a webcast, thereby improving the user experience of the audience users and increasing target audiences of the webcast.

78.

发明授权
Method and device for audio recognition using sample audio and a voting matrix 有权

公开(公告)号：US11133022B2

公开(公告)日：2021-09-28

申请号：US17142917

申请日：2021-01-06

Applicant: ADVANCED NEW TECHNOLOGIES CO., LTD.

Inventor： Zhijun Du , Nan Wang

IPC: G10L25/51 , G06F16/00 , G10L15/02 , G06F40/279 , G06F17/16 , G10L15/04

Abstract: A method may include dividing input audio into frames and calculating a characteristic value for each of the frames. The method may include establishing a voting matrix having a first dimension representing a quantity of segments of sample audio and a second dimension representing a quantity of frames of each segment. The method may include marking voting labels in the voting matrix corresponding to frames of the sample audio when the characteristic values of corresponding frames of the input audio and sample audio match. The method may include determining a frame to be a recognition result when a sum of the voting labels at a corresponding position is higher than a threshold.

79.

发明申请
Fully Supervised Speaker Diarization 有权

公开(公告)号：US20210280197A1

公开(公告)日：2021-09-09

申请号：US17303283

申请日：2021-05-26

Applicant: Google LLC

Inventor： Chong Wang , Aonan Zhang , Quan Wang , Zhenyao Zhu

IPC: G10L17/04 , G10L15/04 , G10L15/07 , G10L17/02 , G10L17/18 , G10L15/26 , G10L17/00

Abstract: A method includes receiving an utterance of speech and segmenting the utterance of speech into a plurality of segments. For each segment of the utterance of speech, the method also includes extracting a speaker=discriminative embedding from the segment and predicting a probability distribution over possible speakers for the segment using a probabilistic generative model configured to receive the extracted speaker-discriminative embedding as a feature input. The probabilistic generative model trained on a corpus of training speech utterances each segmented into a plurality of training segments. Each training segment including a corresponding speaker-discriminative embedding and a corresponding speaker label. The method also includes assigning a speaker label to each segment of the utterance of speech based on the probability distribution over possible speakers for the corresponding segment.

80.

发明申请
METHOD AND DEVICE FOR LOGGING AN ITEM OF INFORMATION RELATING TO A RAIL VEHICLE 有权

公开(公告)号：US20210256964A1

公开(公告)日：2021-08-19

申请号：US17251958

申请日：2019-05-23

Applicant: Siemens Mobility GmbH

Inventor： Georg Lohneis

IPC: G10L15/08 , G10L15/04 , B61L27/00 , B61L15/00

Abstract: A method for logging an item of information relating to a rail vehicle, includes recording a speech input having the item of information, by a user of the rail vehicle and saving the recorded speech input as an audio file. The saved audio file is sent via a wireless communications network to a subscriber, remote from the rail vehicle, of the communications network. A device logs the subscriber, remote from a rail vehicle, of the communications network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification