-
1.
公开(公告)号:US11646017B1
公开(公告)日:2023-05-09
申请号:US17193414
申请日:2021-03-05
Applicant: Meta Platforms, Inc.
Inventor: Yangyang Shi , Yongqiang Wang , Chunyang Wu , Ching-Feng Yeh , Julian Yui-Hin Chan , Qiaochu Zhang , Duc Hoang Le , Michael Lewis Seltzer
IPC: G10L15/16 , G10L15/183 , G06N3/04 , G10L15/22
CPC classification number: G10L15/183 , G06N3/0445 , G10L15/16 , G10L15/22
Abstract: In one embodiment, a method includes accessing a machine-learning model configured to generate an encoding for an utterance by using a module to process data associated with each segment of the utterance in a series of iterations, performing operations associated with an i-th segment during an n-th iteration by the module, which include receiving an input comprising input contextual embeddings generated for the i-th segment in a preceding iteration and a memory bank storing memory vectors generated in the preceding iteration for segments preceding the i-th segment, generating attention outputs and a memory vector based on keys, values, and queries generated using the input, and generating output contextual embeddings for the i-th segment based on the attention outputs, providing the memory vector to the module for performing operations associated with the i-th segment in a next iteration, and performing speech recognition by decoding the encoding of the utterance.
-
公开(公告)号:US20250157464A1
公开(公告)日:2025-05-15
申请号:US18940817
申请日:2024-11-07
Applicant: META PLATFORMS, INC.
Inventor: Chunyang Wu , Yassir Fathullah , Jay Kiran Mahadeokar , Egor Lakomkin , Ozlem Kalinli Akbacak , Christian Fuegen , Michael Lewis Seltzer
IPC: G10L15/183 , G10L15/02 , G10L15/22
Abstract: The present application is at least directed to a method including a step of receiving audio from a user. The method may further include a step of generating, via a trained encoder based upon the received audio, an audio embedding sequence. The method may even further include a step of receiving, via a trained large language model (LLM), the generated audio embedding sequence and a text embedding sequence. The text embedding sequence is arranged before or after the generated audio embedding sequence. The method may yet even further include a step of producing, via the trained LLM based upon text embedding sequence, a textual response associated with the audio received from the user. The method may still even further include a step of causing to display, via a user interface of the user, the produced textual response.
-