-
公开(公告)号:US12198685B2
公开(公告)日:2025-01-14
申请号:US18184432
申请日:2023-03-15
Applicant: PAYPAL, INC.
Inventor: Sandro Cavallari , Yuzhen Zhuo , Van Hoang Nguyen , Quan Jin Ferdinand Tang , Gautam Vasappanavara
IPC: G06F40/00 , G06F40/205 , G06F40/253 , G06F40/284 , G10L15/19 , G10L15/26
Abstract: Methods and systems are presented for translating informal utterances into formal texts. Informal utterances may include words in abbreviation forms or typographical errors. The informal utterances may be processed by mapping each word in an utterance into a well-defined token. The mapping from the words to the tokens may be based on a context associated with the utterance derived by analyzing the utterance in a character-by-character basis. The token that is mapped for each word can be one of a vocabulary token that corresponds to a formal word in a pre-defined word corpus, an unknown token that corresponds to an unknown word, or a masked token. Formal text may then be generated based on the mapped tokens. Through the processing of informal utterances using the techniques disclosed herein, the informal utterances are both normalized and sanitized.
-
公开(公告)号:US20240428779A1
公开(公告)日:2024-12-26
申请号:US18823383
申请日:2024-09-03
Applicant: Comcast Cable Communications, LLC
Inventor: WENYAN LI , FERHAN TURE , JOSE CASILLAS , GEORGE THOMAS DES JARDINS
Abstract: Methods for automatically evaluating ASR outputs and providing annotations, including corrections, on the transcriptions—in order to improve recognition—may be based on an analysis of sessions of user voice queries, utilizing time-ordered ASR transcriptions of user voice queries (i.e., user utterances). This utterance-based approach may involve extracting both session-level and query-level characteristics from a voice query sessions and identifying patterns of query reformulation in order to detect erroneous transcriptions and automatically determine an appropriate correction. Alternative, or in addition, ASR outputs may be evaluated based on user behavior. The outcomes may be classified as positive or negative. An ASR transcription may be labeled using the description of the outcome. The labeled transcription may be used as training data to train a model to output improved transcriptions of voice queries.
-
公开(公告)号:US20240379097A1
公开(公告)日:2024-11-14
申请号:US18464240
申请日:2023-09-10
Applicant: Saltlux Inc.
Inventor: Kyung Il LEE , Jong Won LEE
Abstract: A method for controlling a response utterance being reproduced and predicting a user intention includes: a voice signal analysis step of, when a second voice signal is received from a user while a first response utterance for responding to a first voice signal is output, starting analysis on the second voice signal; an utterance control step of, when one of preset keywords is identified to be included in a first sentence corresponding to the second voice signal, controlling the first response utterance being output to correspond to the identified keyword; an intention prediction step of, when a third voice signal is received in a state where the first response utterance is controlled, analyzing the third voice signal to predict an intention of the user, which is reflected in a second sentence corresponding to the third voice signal; and a response utterance output step of outputting a response utterance.
-
公开(公告)号:US20240379092A1
公开(公告)日:2024-11-14
申请号:US18783423
申请日:2024-07-25
Applicant: SoundHound AI IP, LLC.
Inventor: Anton V. RELIN
Abstract: Systems for automatic speech recognition and/or natural language understanding automatically learn new words by finding subsequences of phonemes that, if they were a new word, would enable a successful tokenization of a phoneme sequence. Systems can learn alternate pronunciations of words by finding phoneme sequences with a small edit distance to existing pronunciations. Systems can learn the part of speech of words by finding part-of-speech variations that would enable parses by syntactic grammars. Systems can learn what types of entities a word describes by finding sentences that could be parsed by a semantic grammar but for the words not being on an entity list.
-
公开(公告)号:US12014144B2
公开(公告)日:2024-06-18
申请号:US17390573
申请日:2021-07-30
Applicant: INTUIT INC.
Inventor: Zhewen Fan , Byungkyu Kang , Wan Yu Zhang , Carlos A. Oliveira , Wenxin Xiao
CPC classification number: G06F40/30 , G06F16/38 , G06F18/22 , G06F40/279 , G10L15/19 , G10L15/22 , H04M3/51
Abstract: A processor may receive a call transcript including text and form a text string including at least a portion of the text. The processor may generate a situation description of the call transcript, which may comprise processing the text string using a transformer-based machine learning model. The processor may generate a trouble description of the call transcript, which may comprise creating a sentence embedding of the situation description, creating sentence embeddings for a plurality of utterances within the portion of the text, determining respective similarities between the sentence embedding of the situation description and each of the sentence embeddings for each respective one of the plurality of utterances, and selecting at least one of the plurality of utterances having at least one highest determined respective similarity as the trouble description. The processor may store a call summary comprising the situation description and the trouble description in a non-transitory memory.
-
公开(公告)号:US20240127798A1
公开(公告)日:2024-04-18
申请号:US18538957
申请日:2023-12-13
Applicant: Sorenson IP Holdings, LLC
Inventor: David Thomson
CPC classification number: G10L15/065 , G06F21/602 , G10L15/063 , G10L15/19 , G10L25/51 , H04L9/0869
Abstract: A method may include obtaining a text string that is a transcription of audio data and selecting a sequence of words from the text string as a first word sequence. The method may further include encrypting the first word sequence and comparing the encrypted first word sequence to multiple encrypted word sequences. Each of the multiple encrypted word sequences may be associated with a corresponding one of multiple counters. The method may also include in response to the encrypted first word sequence corresponding to one of the multiple encrypted word sequences based on the comparison, incrementing a counter of the multiple counters associated with the one of the multiple encrypted word sequences and adapting a language model of an automatic transcription system using the multiple encrypted word sequences and the multiple counters.
-
公开(公告)号:US20240079002A1
公开(公告)日:2024-03-07
申请号:US18262400
申请日:2022-01-05
Applicant: BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
Inventor: Chunsai DU , Jingsheng YANG , Kojung CHEN , Xiang ZHENG , Wenming XU
Abstract: A minutes of meeting processing method, a device, and a medium. The method comprises: acquiring meeting text of a meeting audio/video; inputting the meeting text into a to-do identification model, and determining initial to-do statements; inputting the initial to-do statements into a tense determination model, and determining tense results of the initial to-do statements; and determining a meeting to-do statement in the initial to-do statements on the basis of the tense results.
-
公开(公告)号:US11908463B1
公开(公告)日:2024-02-20
申请号:US17361761
申请日:2021-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Arjit Biswas , Shishir Bharathi , Anushree Venkatesh , Yun Lei , Ashish Kumar Agrawal , Siddhartha Reddy Jonnalagadda , Prakash Krishnan , Arindam Mandal , Raefer Christopher Gabriel , Abhay Kumar Jha , David Chi-Wai Tang , Savas Parastatidis
IPC: G10L15/22 , G06F40/35 , G10L15/183 , G10L15/18 , G06F40/279 , G06F40/295 , G10L15/19 , G06F40/30
CPC classification number: G10L15/183 , G06F40/279 , G10L15/1815 , G10L15/22 , G06F40/295 , G06F40/30 , G06F40/35 , G10L15/1822 , G10L15/19 , G10L2015/228
Abstract: Techniques for storing and using multi-session context are described. A system may store context data corresponding to a first interaction, where the context data may include action data, entity data and a profile identifier for a user. Later the stored context data may be retrieved during a second interaction corresponding to the entity of the second interaction. The second interaction may take place at a system different than the first interaction. The system may generate a response during the second interaction using the stored context data of the prior interaction.
-
公开(公告)号:US11875797B2
公开(公告)日:2024-01-16
申请号:US17355023
申请日:2021-06-22
Applicant: Pozotron Inc.
Inventor: Jakub Poznanski , Kostiantyn Hlushak
IPC: G10L15/26 , G10L15/08 , G10L15/19 , G10L15/187 , G06F3/04842 , G10L15/06
CPC classification number: G10L15/26 , G06F3/04842 , G10L15/063 , G10L15/083 , G10L15/187 , G10L15/19
Abstract: A scripted audio production system in which the scripted audio production computerized process decreases production time by improving computerized processes and technological systems for pronunciation research and script preparation, narration, editing, proofing and mastering. The system enables the user to upload their manuscript and recorded audio of the narration of the manuscript to the system. The system then compares the recorded audio against previously uploaded manuscript and any mistakes or deviations from the manuscript are highlighted or otherwise indicated to the user. The system automatically pieces together the last-read audio into a clean file without the need for significant user interaction. The process may also be performed on the recorded audio by the narrator first uploading the audio and manuscript to the scripted audio production technology system.
-
公开(公告)号:US20230352009A1
公开(公告)日:2023-11-02
申请号:US17732971
申请日:2022-04-29
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Piyush BEHRE , Sharman W TAN , Shuangyu CHANG , Padma VARADHARAJAN , Sayan Dev PATHAK , Ravikant GUPTA
IPC: G10L15/19 , G10L15/04 , G10L15/22 , G06F40/58 , G06F40/103
CPC classification number: G10L15/19 , G10L15/04 , G10L15/22 , G06F40/58 , G06F40/103
Abstract: Systems generate segments of spoken language utterances based on different sets of segmentation boundaries. The systems are also configured to generate one or more formatted segments by assigning a punctuation tags at segmentation boundaries and to generate one or more final sentences from the one or more segments.
-
-
-
-
-
-
-
-
-