Language Model Prediction of API Call Invocations and Verbal Response

    公开(公告)号:US20240290327A1

    公开(公告)日:2024-08-29

    申请号:US18658132

    申请日:2024-05-08

    申请人: Google LLC

    摘要: A method includes obtaining an utterance from a user including a user query directed toward a digital assistant. The method includes generating, using a language model, a first prediction string based on the utterance and determining whether the first prediction string includes an application programming interface (API) call to invoke a program via an API. When the first prediction string includes the API call to invoke the program, the method includes calling, using the API call, the program via the API to retrieve a program result; receiving, via the API, the program result; updating a conversational context with the program result that includes the utterance; and generating, using the language model, a second prediction string based on the updated conversational context. When the first prediction string does not include the API call, the method includes providing an utterance response to the utterance based on the first prediction string.

    LANGUAGE MODEL CUSTOMIZATION TECHNIQUES AND APPLICATIONS THEREOF

    公开(公告)号:US20240257804A1

    公开(公告)日:2024-08-01

    申请号:US18160085

    申请日:2023-01-26

    申请人: GONG.io Ltd.

    发明人: Ruth ALONI-LAVI

    摘要: A system and method for automated speech recognition using customized language models. A method includes identifying a plurality of words among first content, wherein the first content corresponds to a use case; adjusting a language model based on the plurality of words in order to create a customized language model, wherein the customized language model is configured to output language predictions when applied to features extracted from audio content, wherein the language model is adjusted to increase a likelihood that the language model outputs the plurality of words as language predictions; applying the customized language model to second content in order to determine a plurality of outputs of the customized language model, wherein the second content is audio content corresponding to the use case; and determining speech recognition outputs based on the plurality of outputs of the customized language model.

    User mediation for hotword/keyword detection

    公开(公告)号:US12027160B2

    公开(公告)日:2024-07-02

    申请号:US18074691

    申请日:2022-12-05

    申请人: GOOGLE LLC

    摘要: Techniques are described herein for improving performance of machine learning model(s) and thresholds utilized in determining whether automated assistant function(s) are to be initiated. A method includes: receiving, via one or more microphones of a client device, audio data that captures a spoken utterance of a user; processing the audio data using a machine learning model to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in the audio data than is a primary threshold; in response to determining that the predicted output satisfies the secondary threshold, prompting the user to indicate whether or not the spoken utterance includes a hotword; receiving, from the user, a response to the prompting; and adjusting the primary threshold based on the response.