-
公开(公告)号:US11308936B2
公开(公告)日:2022-04-19
申请号:US16399211
申请日:2019-04-30
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Tae-yoon Kim , Sang-ha Kim , Sung-Soo Kim , Jin-sik Lee , Chang-woo Han , Eun-kyoung Kim , Jae-won Lee
Abstract: A speech signal processing method of a user terminal includes: receiving a speech signal, detecting a personalized information section including personal information in the speech signal, performing data processing on the personalized information section of the speech signal by using a personalized model generated based on the personal information, and receiving, from a server, a result of the data processing performed by the server on a general information section of the speech signal that is different than the personalized information section of the speech signal.
-
公开(公告)号:US11217233B1
公开(公告)日:2022-01-04
申请号:US17370441
申请日:2021-07-08
Applicant: SAS Institute Inc.
Inventor: Xiaozhuo Cheng , Xu Yang , Xiaolong Li , Biljana Belamaric Wilsey , Haipeng Liu , Jared Peterson
Abstract: An apparatus includes processor(s) to: generate a set of candidate n-grams based on probability distributions from an acoustic model for candidate graphemes of a next word most likely spoken following at least one preceding word spoken within speech audio; provide the set of candidate n-grams to multiple devices; provide, to each node device, an indication of which candidate n-grams are to be searched for within the n-gram corpus by each node device to enable searches for multiple candidate n-grams to be performed, independently and at least partially in parallel, across the node devices; receive, from each node device, an indication of a probability of occurrence of at least one candidate n-gram within the speech audio; based on the received probabilities of occurrence, identify the next word most likely spoken within the speech audio; and add the next word most likely spoken to a transcript of the speech audio.
-
公开(公告)号:US11200676B2
公开(公告)日:2021-12-14
申请号:US16746340
申请日:2020-01-17
Applicant: Verily Life Sciences LLC
Inventor: Cheng-Hsun Wu , Ali Behrooz
Abstract: Systems and methods of improving alignment in dense prediction neural networks are disclosed. A method includes identifying, at a computing system, an input data set and a label data set with one or more first parts of the input data set corresponding to a label. The computing system processes the input data set using a neural network to generate a predicted label data set that identifies one or more second parts of the input data set predicted to correspond to the label. The computing system determines an alignment result using the predicted label data set and the label data set and a transformation of the one or more first parts, including a shift, rotation, scaling, and/or deformation, based on the alignment result. The computing system computes a loss score using the transformation, label data and the predicted label data set and updates the neural network based on the loss score.
-
公开(公告)号:US11183204B2
公开(公告)日:2021-11-23
申请号:US16503707
申请日:2019-07-05
Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
Inventor: Jung-Yi Lin
Abstract: A voice recognition system includes a computing device and at least one mobile terminal communicatively coupled to the computing device through a network. The computing device obtains an original sound from the at least one mobile terminal and converts the original sound into a digitized time-frequency map, performs compression segmentation on the time-frequency map to obtain a sound image corresponding to the time-frequency map, and uses an image recognition method to recognize the sound image, obtain an enhanced sound image, and search a preset database for sound information corresponding to the enhanced sound image.
-
公开(公告)号:US20210335340A1
公开(公告)日:2021-10-28
申请号:US17224967
申请日:2021-04-07
Applicant: SoundHound, Inc.
Inventor: Zizu GOWAYYED , Keyvan MOHAJER
Abstract: A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.
-
公开(公告)号:US11145309B1
公开(公告)日:2021-10-12
申请号:US17205871
申请日:2021-03-18
Applicant: SAS Institute Inc.
Inventor: Xu Yang
Abstract: An apparatus includes processor(s) to: use an acoustic model to generate a first set of probabilities of speech sounds uttered within speech audio; derive at least a first candidate word most likely spoken in the speech audio using the first set; analyze the first set to derive a degree of uncertainty therefor; compare the degree of uncertainty to a threshold; in response to at least the degree of uncertainty being less than the threshold, select the first candidate word as a next word most likely spoken in the speech audio; in response to at least the degree of uncertainty being greater than the threshold, select, as the next word most likely spoken in the speech audio, a second candidate word indicated as being most likely spoken based on a second set of probabilities generated by a language model; and add the next word most likely spoken to a transcript.
-
公开(公告)号:US20210311699A1
公开(公告)日:2021-10-07
申请号:US15733891
申请日:2019-09-04
Inventor: Yang ZHANG , Meizhuo LI
Abstract: Embodiments of the present application provide a speech playback method and apparatus, an electronic device and a storage medium. The method specifically comprises: receiving speech data sent by first electronic devices to obtain a speech data set; receiving audio and video data sent by a second electronic device, the audio and video data comprising speech data selected for playback, and the speech data selected for playback comprising any one of the speech data clicked for playback in the speech data set; and pushing the audio and video data to each first electronic device. For a webcast system, audience users using second electronic devices can interact with anchor users by means of speech, so that audience users who input a text slow or who can't input a text can also easily express opinions in a webcast, thereby improving the user experience of the audience users and increasing target audiences of the webcast.
-
公开(公告)号:US11133022B2
公开(公告)日:2021-09-28
申请号:US17142917
申请日:2021-01-06
Applicant: ADVANCED NEW TECHNOLOGIES CO., LTD.
Abstract: A method may include dividing input audio into frames and calculating a characteristic value for each of the frames. The method may include establishing a voting matrix having a first dimension representing a quantity of segments of sample audio and a second dimension representing a quantity of frames of each segment. The method may include marking voting labels in the voting matrix corresponding to frames of the sample audio when the characteristic values of corresponding frames of the input audio and sample audio match. The method may include determining a frame to be a recognition result when a sum of the voting labels at a corresponding position is higher than a threshold.
-
公开(公告)号:US20210280197A1
公开(公告)日:2021-09-09
申请号:US17303283
申请日:2021-05-26
Applicant: Google LLC
Inventor: Chong Wang , Aonan Zhang , Quan Wang , Zhenyao Zhu
Abstract: A method includes receiving an utterance of speech and segmenting the utterance of speech into a plurality of segments. For each segment of the utterance of speech, the method also includes extracting a speaker=discriminative embedding from the segment and predicting a probability distribution over possible speakers for the segment using a probabilistic generative model configured to receive the extracted speaker-discriminative embedding as a feature input. The probabilistic generative model trained on a corpus of training speech utterances each segmented into a plurality of training segments. Each training segment including a corresponding speaker-discriminative embedding and a corresponding speaker label. The method also includes assigning a speaker label to each segment of the utterance of speech based on the probability distribution over possible speakers for the corresponding segment.
-
公开(公告)号:US20210256964A1
公开(公告)日:2021-08-19
申请号:US17251958
申请日:2019-05-23
Applicant: Siemens Mobility GmbH
Inventor: Georg Lohneis
Abstract: A method for logging an item of information relating to a rail vehicle, includes recording a speech input having the item of information, by a user of the rail vehicle and saving the recorded speech input as an audio file. The saved audio file is sent via a wireless communications network to a subscriber, remote from the rail vehicle, of the communications network. A device logs the subscriber, remote from a rail vehicle, of the communications network.
-
-
-
-
-
-
-
-
-