-
71.
公开(公告)号:US11948557B2
公开(公告)日:2024-04-02
申请号:US17539282
申请日:2021-12-01
CPC分类号: G10L15/1815 , G06F3/04817 , G06F40/30 , G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30
摘要: Aspects of the disclosure relate to using an apparatus for flagging and removing real time workflows that produce sub-optimal results. Such an apparatus may include an utterance sentiment classifier. The apparatus stores a hierarchy of rules. Each of the rules is associated with one or more rule signals. In response to receiving the one or more utterance signals, the classifier iterates through the hierarchy of rules in sequential order to identify a first rule for which the one or more utterance signals are a superset of the rule's one or more rule signals. In response to receiving the one or more alternate utterance signals from the signal extractor, the classifier may iterate through the hierarchy of rules in sequential order to identify the first rule in the hierarchy for which the one or more alternate utterance signals are a superset of the first rule's one or more rule signals.
-
72.
公开(公告)号:US20240105166A1
公开(公告)日:2024-03-28
申请号:US18350111
申请日:2023-07-11
发明人: Hoon CHUNG , Byung Ok KANG , Yoonhyung KIM
IPC分类号: G10L15/16 , G10L15/06 , G10L15/065
CPC分类号: G10L15/16 , G10L15/063 , G10L15/065
摘要: Provided is a self-supervised learning method based on permutation invariant cross entropy. A self-supervised learning method based on permutation invariant cross entropy performed by an electronic device includes: defining a cross entropy loss function for pre-training of an end-to-end speech recognition model; configuring non-transcription speech corpus data composed only of speech as input data of the cross entropy loss function; setting all permutations of classes included in the non-transcription speech corpus data as an output target and calculating cross entropy losses for each class; and determining a minimum cross entropy loss among the calculated cross entropy losses for each class as a final loss.
-
公开(公告)号:US20240105162A1
公开(公告)日:2024-03-28
申请号:US18257403
申请日:2021-11-18
发明人: Kang WANG
CPC分类号: G10L15/063 , G10L15/005 , G10L15/22
摘要: The present disclosure relates to a method for training a model, a speech recognition method, an apparatus, a medium, and a device, the method including: acquiring training data, wherein the training data includes labeled data of at least two languages; ranking the languages in a descending order of a quantity of the labeled data of each language to obtain a training order corresponding to the languages; and sequentially acquiring, in accordance with ranking of the languages indicated by the training order, target data corresponding to each language to perform iterative training on a preset model, to obtain a target speech recognition model, wherein the target data is determined in accordance with the labeled data of language(s) from first ranking to current ranking in the training order.
-
公开(公告)号:US11942077B2
公开(公告)日:2024-03-26
申请号:US17949741
申请日:2022-09-21
发明人: Kyoungbo Min , Seungdo Choi , Doohwa Hong
CPC分类号: G10L15/063 , G10L13/00 , G10L15/16
摘要: An electronic device for providing a text-to-speech (TTS) service and an operating method therefor are provided. The operating method of the electronic device includes obtaining target voice data based on an utterance input of a specific speaker, determining a number of learning steps of the target voice data, based on data features including a data amount of the target voice data, generating a target model by training a pre-trained model pre-trained to convert text into an audio signal, by using the target voice data as training data, based on the determined number of learning steps, generating output data obtained by converting input text into an audio signal, by using the generated target model, and outputting the generated output data.
-
75.
公开(公告)号:US11942076B2
公开(公告)日:2024-03-26
申请号:US17651315
申请日:2022-02-16
申请人: Google LLC
IPC分类号: G10L15/30 , G10L15/02 , G10L15/06 , G10L15/187 , G10L15/193 , G10L15/28 , G10L15/32 , G10L25/30
CPC分类号: G10L15/063 , G10L15/02 , G10L15/187 , G10L15/193 , G10L15/285 , G10L15/32 , G10L25/30 , G10L2015/025
摘要: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.
-
公开(公告)号:US20240096313A1
公开(公告)日:2024-03-21
申请号:US17946523
申请日:2022-09-16
发明人: Lavinia Andreea Danielescu , Timothy M. Shea , Kenneth Michael Stewart , Noah Gideon Pacik-Nelson , Eric Michael Gallo
CPC分类号: G10L15/16 , G10L15/063 , G10L15/197 , G10L15/22 , G10L15/30 , G10L25/21 , G10L2015/0635 , G10L2015/223
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing speech using a spiking neural network acoustic model implemented on a neuromorphic processor are described. In one aspect, a method includes receiving, a trained acoustic model implemented as a spiking neural network (SNN) on a neuromorphic processor of a client device, a set of feature coefficients that represent acoustic energy of input audio received from a microphone communicably coupled to the client device. The acoustic model is trained to predict speech sounds based on input feature coefficients. The acoustic model generates output data indicating predicted speech sounds corresponding to the set of feature coefficients that represent the input audio received from the microphone. The neuromorphic processor updates one or more parameters of the acoustic model using one or more learning rules and the predicted speech sounds of the output data.
-
公开(公告)号:US20240096236A1
公开(公告)日:2024-03-21
申请号:US18038520
申请日:2021-11-09
申请人: ROLLS-ROYCE PLC
CPC分类号: G09B21/00 , G06F3/013 , G10L13/033 , G10L15/063 , G10L15/18 , G10L15/22
摘要: A device for generating conversational replies, including a processor with a memory; a speech input module, a user input module; a natural language processing module including one or more encoder-decode modules; the device being configured to: record portions of a conversation through the speech input module, use a speech recognition module to identify words in the conversation, and when one or more words have been recognised: generate one or more responses based on the one or more words using the natural language processing module; selecting a group of the context sensitive responses, prompt the user via the user input module to select a response from the group, output the selected response.
-
78.
公开(公告)号:US20240095491A1
公开(公告)日:2024-03-21
申请号:US18527077
申请日:2023-12-01
申请人: Quantiphi, Inc.
发明人: Dagnachew Birru , Saisubramaniam Gopalakrishnan , Siva Prasad Sompalli , Varun V , Vishal Vaddina
IPC分类号: G06N3/006 , G06N3/0455 , G06N3/0475 , G10L15/06 , G10L15/183 , G10L15/22 , H04L51/02
CPC分类号: G06N3/006 , G06N3/0455 , G06N3/0475 , G10L15/063 , G10L15/183 , G10L15/22 , H04L51/02
摘要: A method and system for multimodal response generation through a virtual agent is provided herein. The method comprises retrieving information related to an input received by the virtual agent. The virtual agent employs an Artificial Intelligence (AI) model. The method further comprises generating a response corresponding to the input based on the retrieved information. The method may further comprises generating a plurality of prompts based on user characteristics and the input. The method may further comprises modifying the response based on the plurality of prompts to generate a multimodal response.
-
79.
公开(公告)号:US20240086639A1
公开(公告)日:2024-03-14
申请号:US17931911
申请日:2022-09-14
发明人: Sanket Jain , Krishnasuri Narayanam , Ratnakar Behera , Avinash Tukaram Mane , ZHENG XIE , JOY PATRA
IPC分类号: G06F40/30 , G06F40/279 , G10L15/06 , G10L15/18 , G10L15/22
CPC分类号: G06F40/30 , G06F40/279 , G10L15/063 , G10L15/1815 , G10L15/22
摘要: Method, computer program product, and computer system are provided. A model is trained, in real-time to identify likely duplicate questions. A level of duplication is identified between a question and a previously asked question in a meeting transcript. An asker is pointed to where in the meeting transcript the question was the previously asked. All duplicate questions are arranged in a single point question by topic. A new meeting transcript is generated and displayed to attendees, including each individual question and each single point question.
-
公开(公告)号:US20240079026A1
公开(公告)日:2024-03-07
申请号:US18349796
申请日:2023-07-10
发明人: Meryem Berrada , John Stavropoulos
CPC分类号: G10L25/51 , G10L15/063 , G10L15/08 , G10L15/22 , G10L2015/088
摘要: Methods, apparatus, systems and articles of manufacture to measure engagement of media consumers based on acoustic environment are disclosed. Example apparatus disclosed herein are to identify media device audio data and ambient environment audio data from sensed audio data collected from an environment, and determine classification data for the media device audio data and the ambient environment audio data. Disclosed example apparatus are also to process the classification data with a machine learning model to calculate an engagement metric. Disclosed example apparatus are further to determine whether at least one individual is engaged with media in the environment based on the engagement metric.
-
-
-
-
-
-
-
-
-