-
公开(公告)号:US20240330413A1
公开(公告)日:2024-10-03
申请号:US18192130
申请日:2023-03-29
发明人: Mohammed Hamzeh , David C. White, JR. , Christopher Shaun Roberts , Magnus Mortensen , Kevin D. McCabe , Felipe De Mello , Deon Anthony Pillsbury
IPC分类号: G06F18/2415 , G06N20/00 , G10L15/02 , G10L15/06 , G10L15/197
CPC分类号: G06F18/2415 , G06N20/00 , G10L15/02 , G10L15/063 , G10L15/197
摘要: The techniques described herein relate to a method including: providing input to a plurality of prediction models; obtaining an initial prediction from each of the plurality of prediction models; providing the input to one or more weight models; obtaining from the one or more weight models a weight for each initial prediction, wherein the weight for each initial prediction is based upon the input and behavior of each of the plurality of prediction models; and determining an output prediction from the initial predictions and the weights.
-
公开(公告)号:US20240290327A1
公开(公告)日:2024-08-29
申请号:US18658132
申请日:2024-05-08
申请人: Google LLC
IPC分类号: G10L15/197 , G10L13/02 , G10L15/06 , G10L15/22
CPC分类号: G10L15/197 , G10L13/02 , G10L15/063 , G10L15/22
摘要: A method includes obtaining an utterance from a user including a user query directed toward a digital assistant. The method includes generating, using a language model, a first prediction string based on the utterance and determining whether the first prediction string includes an application programming interface (API) call to invoke a program via an API. When the first prediction string includes the API call to invoke the program, the method includes calling, using the API call, the program via the API to retrieve a program result; receiving, via the API, the program result; updating a conversational context with the program result that includes the utterance; and generating, using the language model, a second prediction string based on the updated conversational context. When the first prediction string does not include the API call, the method includes providing an utterance response to the utterance based on the first prediction string.
-
公开(公告)号:US20240257804A1
公开(公告)日:2024-08-01
申请号:US18160085
申请日:2023-01-26
申请人: GONG.io Ltd.
发明人: Ruth ALONI-LAVI
IPC分类号: G10L15/197 , G10L15/065 , G10L15/18 , G10L15/22
CPC分类号: G10L15/197 , G10L15/065 , G10L15/1815 , G10L15/22 , G10L2015/228
摘要: A system and method for automated speech recognition using customized language models. A method includes identifying a plurality of words among first content, wherein the first content corresponds to a use case; adjusting a language model based on the plurality of words in order to create a customized language model, wherein the customized language model is configured to output language predictions when applied to features extracted from audio content, wherein the language model is adjusted to increase a likelihood that the language model outputs the plurality of words as language predictions; applying the customized language model to second content in order to determine a plurality of outputs of the customized language model, wherein the second content is audio content corresponding to the use case; and determining speech recognition outputs based on the plurality of outputs of the customized language model.
-
公开(公告)号:US12027160B2
公开(公告)日:2024-07-02
申请号:US18074691
申请日:2022-12-05
申请人: GOOGLE LLC
IPC分类号: G10L15/22 , G10L15/06 , G10L15/197 , G10L15/08
CPC分类号: G10L15/197 , G10L15/063 , G10L15/22 , G10L2015/088 , G10L2015/223
摘要: Techniques are described herein for improving performance of machine learning model(s) and thresholds utilized in determining whether automated assistant function(s) are to be initiated. A method includes: receiving, via one or more microphones of a client device, audio data that captures a spoken utterance of a user; processing the audio data using a machine learning model to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in the audio data than is a primary threshold; in response to determining that the predicted output satisfies the secondary threshold, prompting the user to indicate whether or not the spoken utterance includes a hotword; receiving, from the user, a response to the prompting; and adjusting the primary threshold based on the response.
-
公开(公告)号:US20240212676A1
公开(公告)日:2024-06-27
申请号:US18087158
申请日:2022-12-22
发明人: Brandon Kevin ROPER
IPC分类号: G10L15/197 , G06F16/638 , G06F16/683 , G06F40/166 , G06F40/58 , G10L15/04 , G10L15/06 , G10L15/22 , G10L15/30
CPC分类号: G10L15/197 , G06F16/638 , G06F16/685 , G06F40/166 , G06F40/58 , G10L15/04 , G10L15/063 , G10L15/22 , G10L15/30 , G10L2015/0635 , G10L15/1822
摘要: The present disclosure relates to systems and methods for using metadata to improve identifying desired information within a transcription. The systems and methods including receiving audio data, recognizing terms within the audio data, identifying alternate recognized terms within the audio data, the alternate recognized terms corresponding to the recognized terms within the audio data, generating a transcription based on the recognized terms, and generating metadata associated with the transcription, the metadata comprising the alternate recognized terms and with links between the alternate recognized terms and the recognized terms within the transcription. The systems and methods also include receiving a search query for a string within the transcription, searching for the string within the transcription and within the metadata, and providing one or more search results based on the searching.
-
公开(公告)号:US20240153500A1
公开(公告)日:2024-05-09
申请号:US18500969
申请日:2023-11-02
发明人: Wenbiao ZHAO , Jinzhen LIN , Zhenzhe YING , Lanqing XUE , Weiqiang WANG , Ke XU , Qi LI
IPC分类号: G10L15/18 , G06F3/01 , G06N20/00 , G10L15/06 , G10L15/197
CPC分类号: G10L15/1815 , G06F3/011 , G06N20/00 , G10L15/063 , G10L15/197 , G10L2015/088
摘要: Implementations of the present specification provide a data processing method, apparatus, and device. The method includes: obtaining to-be-detected target data, and obtaining a target probability that the target data corresponds to each candidate user intention, where the target data includes input data of a user in a human-computer interaction process; dividing the target data to obtain a plurality of pieces of subdata, and obtaining, based on a predetermined gradient integration algorithm, a contribution of each piece of subdata to a correspondence between the target data and each candidate user intention; and determining a target user intention corresponding to the target data based on the target probability that the target data corresponds to each candidate user intention and the contribution of each piece of subdata to the correspondence between the target data and each candidate user intention.
-
7.
公开(公告)号:US20240153497A1
公开(公告)日:2024-05-09
申请号:US18413354
申请日:2024-01-16
发明人: Gang XU , Tao YANG , Ming-Jung SEOW
IPC分类号: G10L15/16 , G06F40/237 , G06N20/00 , G06V20/52 , G06V40/20 , G10L15/197
CPC分类号: G10L15/16 , G06F40/237 , G06N20/00 , G06V20/52 , G06V40/20 , G10L15/197
摘要: Techniques are disclosed to optimize feature selection in generating betas for a feature dictionary of a neuro-linguistic Cognitive AI System. A machine learning engine receives a sample vector of input data to be analyzed by the neuro-linguistic Cognitive AI System. The neuro-linguistic Cognitive AI System is configured to generate multiple betas for each of a plurality of sensors. The machine learning engine identifies a sensor specified in the sample vector and selects optimization parameters for generating betas based on the identified sensor.
-
公开(公告)号:US11978447B2
公开(公告)日:2024-05-07
申请号:US17279540
申请日:2020-09-17
发明人: Haifeng Wang , Jizhou Huang
IPC分类号: G10L15/22 , G06F16/33 , G10L15/06 , G10L15/197
CPC分类号: G10L15/22 , G06F16/3344 , G10L15/063 , G10L15/197 , G10L2015/221 , G10L2015/223
摘要: The present disclosure provides a speech interaction method, apparatus, device and computer storage medium and relates to the field of artificial intelligence. A specific implementation solution is as follows: performing speech recognition and demand analysis for a first speech instruction input by a user; performing demand prediction for the first speech instruction if the demand analysis fails, to obtain at least one demand expression; returning at least one of the demand expression to the user in a form of a question; performing a service response with a demand analysis result corresponding to the demand expression confirmed by the user, if a second speech instruction confirming at least one of the demand expression is received from the user. The present disclosure can efficiently improve the user's interaction efficiency and enhance the user's experience.
-
公开(公告)号:US11972752B2
公开(公告)日:2024-04-30
申请号:US17979715
申请日:2022-11-02
申请人: ActionPower Corp.
发明人: Dongchan Shin
IPC分类号: G10L15/05 , G10L15/16 , G10L15/197 , G10L15/22 , G10L25/78
CPC分类号: G10L15/05 , G10L15/197 , G10L15/22 , G10L25/78 , G10L15/16
摘要: Disclosed is a method for detecting a speech segment, which is performed by a computing device. The method may include: detecting a start point of a speech segment in an audio signal; and detecting an end point of the speech segment based on an offset threshold which is dynamically changed, and the dynamically changed offset threshold may be based on a length of the speech segment.
-
公开(公告)号:US11967315B2
公开(公告)日:2024-04-23
申请号:US17660335
申请日:2022-04-22
CPC分类号: G10L15/197 , G06N3/08 , G06N7/01 , G10L15/005 , G10L15/02 , G10L15/16 , G10L15/22 , G10L2015/223
摘要: A method includes performing, using at least one processor, feature extraction of input audio data to identify extracted features associated with the input audio data. The method also includes detecting, using the at least one processor, a language associated with the input audio data by processing the extracted features using a plurality of language models, where each language model is associated with a different language. The method further includes directing, using the at least one processor, the input audio data to one of a plurality of automatic speech recognition (ASR) models based on the language associated with the input audio data.
-
-
-
-
-
-
-
-
-