Speech-to-analytics framework with support for large n-gram corpora

    公开(公告)号:US11217233B1

    公开(公告)日:2022-01-04

    申请号:US17370441

    申请日:2021-07-08

    Abstract: An apparatus includes processor(s) to: generate a set of candidate n-grams based on probability distributions from an acoustic model for candidate graphemes of a next word most likely spoken following at least one preceding word spoken within speech audio; provide the set of candidate n-grams to multiple devices; provide, to each node device, an indication of which candidate n-grams are to be searched for within the n-gram corpus by each node device to enable searches for multiple candidate n-grams to be performed, independently and at least partially in parallel, across the node devices; receive, from each node device, an indication of a probability of occurrence of at least one candidate n-gram within the speech audio; based on the received probabilities of occurrence, identify the next word most likely spoken within the speech audio; and add the next word most likely spoken to a transcript of the speech audio.

    Shift invariant loss for deep learning based image segmentation

    公开(公告)号:US11200676B2

    公开(公告)日:2021-12-14

    申请号:US16746340

    申请日:2020-01-17

    Abstract: Systems and methods of improving alignment in dense prediction neural networks are disclosed. A method includes identifying, at a computing system, an input data set and a label data set with one or more first parts of the input data set corresponding to a label. The computing system processes the input data set using a neural network to generate a predicted label data set that identifies one or more second parts of the input data set predicted to correspond to the label. The computing system determines an alignment result using the predicted label data set and the label data set and a transformation of the one or more first parts, including a shift, rotation, scaling, and/or deformation, based on the alignment result. The computing system computes a loss score using the transformation, label data and the predicted label data set and updates the neural network based on the loss score.

    Sound recognition system and method

    公开(公告)号:US11183204B2

    公开(公告)日:2021-11-23

    申请号:US16503707

    申请日:2019-07-05

    Inventor: Jung-Yi Lin

    Abstract: A voice recognition system includes a computing device and at least one mobile terminal communicatively coupled to the computing device through a network. The computing device obtains an original sound from the at least one mobile terminal and converts the original sound into a digitized time-frequency map, performs compression segmentation on the time-frequency map to obtain a sound image corresponding to the time-frequency map, and uses an image recognition method to recognize the sound image, obtain an enhanced sound image, and search a preset database for sound information corresponding to the enhanced sound image.

    METHOD AND SYSTEM FOR ACOUSTIC MODEL CONDITIONING ON NON-PHONEME INFORMATION FEATURES

    公开(公告)号:US20210335340A1

    公开(公告)日:2021-10-28

    申请号:US17224967

    申请日:2021-04-07

    Abstract: A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

    Dynamic model selection in speech-to-text processing

    公开(公告)号:US11145309B1

    公开(公告)日:2021-10-12

    申请号:US17205871

    申请日:2021-03-18

    Inventor: Xu Yang

    Abstract: An apparatus includes processor(s) to: use an acoustic model to generate a first set of probabilities of speech sounds uttered within speech audio; derive at least a first candidate word most likely spoken in the speech audio using the first set; analyze the first set to derive a degree of uncertainty therefor; compare the degree of uncertainty to a threshold; in response to at least the degree of uncertainty being less than the threshold, select the first candidate word as a next word most likely spoken in the speech audio; in response to at least the degree of uncertainty being greater than the threshold, select, as the next word most likely spoken in the speech audio, a second candidate word indicated as being most likely spoken based on a second set of probabilities generated by a language model; and add the next word most likely spoken to a transcript.

    METHOD AND DEVICE FOR PLAYING VOICE, ELECTRONIC DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20210311699A1

    公开(公告)日:2021-10-07

    申请号:US15733891

    申请日:2019-09-04

    Abstract: Embodiments of the present application provide a speech playback method and apparatus, an electronic device and a storage medium. The method specifically comprises: receiving speech data sent by first electronic devices to obtain a speech data set; receiving audio and video data sent by a second electronic device, the audio and video data comprising speech data selected for playback, and the speech data selected for playback comprising any one of the speech data clicked for playback in the speech data set; and pushing the audio and video data to each first electronic device. For a webcast system, audience users using second electronic devices can interact with anchor users by means of speech, so that audience users who input a text slow or who can't input a text can also easily express opinions in a webcast, thereby improving the user experience of the audience users and increasing target audiences of the webcast.

    Method and device for audio recognition using sample audio and a voting matrix

    公开(公告)号:US11133022B2

    公开(公告)日:2021-09-28

    申请号:US17142917

    申请日:2021-01-06

    Inventor: Zhijun Du Nan Wang

    Abstract: A method may include dividing input audio into frames and calculating a characteristic value for each of the frames. The method may include establishing a voting matrix having a first dimension representing a quantity of segments of sample audio and a second dimension representing a quantity of frames of each segment. The method may include marking voting labels in the voting matrix corresponding to frames of the sample audio when the characteristic values of corresponding frames of the input audio and sample audio match. The method may include determining a frame to be a recognition result when a sum of the voting labels at a corresponding position is higher than a threshold.

    Fully Supervised Speaker Diarization

    公开(公告)号:US20210280197A1

    公开(公告)日:2021-09-09

    申请号:US17303283

    申请日:2021-05-26

    Applicant: Google LLC

    Abstract: A method includes receiving an utterance of speech and segmenting the utterance of speech into a plurality of segments. For each segment of the utterance of speech, the method also includes extracting a speaker=discriminative embedding from the segment and predicting a probability distribution over possible speakers for the segment using a probabilistic generative model configured to receive the extracted speaker-discriminative embedding as a feature input. The probabilistic generative model trained on a corpus of training speech utterances each segmented into a plurality of training segments. Each training segment including a corresponding speaker-discriminative embedding and a corresponding speaker label. The method also includes assigning a speaker label to each segment of the utterance of speech based on the probability distribution over possible speakers for the corresponding segment.

Patent Agency Ranking