Patent search ap:("Meta Platforms Page Inc.") AND inv:"Chunyang Wu"

1.

发明授权
Efficient memory transformer based acoustic model for low latency streaming speech recognition 有权

公开(公告)号：US11646017B1

公开(公告)日：2023-05-09

申请号：US17193414

申请日：2021-03-05

Applicant: Meta Platforms, Inc.

Inventor： Yangyang Shi , Yongqiang Wang , Chunyang Wu , Ching-Feng Yeh , Julian Yui-Hin Chan , Qiaochu Zhang , Duc Hoang Le , Michael Lewis Seltzer

IPC: G10L15/16 , G10L15/183 , G06N3/04 , G10L15/22

CPC classification number: G10L15/183 , G06N3/0445 , G10L15/16 , G10L15/22

Abstract: In one embodiment, a method includes accessing a machine-learning model configured to generate an encoding for an utterance by using a module to process data associated with each segment of the utterance in a series of iterations, performing operations associated with an i-th segment during an n-th iteration by the module, which include receiving an input comprising input contextual embeddings generated for the i-th segment in a preceding iteration and a memory bank storing memory vectors generated in the preceding iteration for segments preceding the i-th segment, generating attention outputs and a memory vector based on keys, values, and queries generated using the input, and generating output contextual embeddings for the i-th segment based on the attention outputs, providing the memory vector to the module for performing operations associated with the i-th segment in a next iteration, and performing speech recognition by decoding the encoding of the utterance.

2.

发明申请
TOWARDS END-TO-END SPEECH-INPUT CONVERSATIONAL LARGE LANGUAGE MODELS 有权

公开(公告)号：US20250157464A1

公开(公告)日：2025-05-15

申请号：US18940817

申请日：2024-11-07

Applicant: META PLATFORMS, INC.

Inventor： Chunyang Wu , Yassir Fathullah , Jay Kiran Mahadeokar , Egor Lakomkin , Ozlem Kalinli Akbacak , Christian Fuegen , Michael Lewis Seltzer

IPC: G10L15/183 , G10L15/02 , G10L15/22

Abstract: The present application is at least directed to a method including a step of receiving audio from a user. The method may further include a step of generating, via a trained encoder based upon the received audio, an audio embedding sequence. The method may even further include a step of receiving, via a trained large language model (LLM), the generated audio embedding sequence and a text embedding sequence. The text embedding sequence is arranged before or after the generated audio embedding sequence. The method may yet even further include a step of producing, via the trained LLM based upon text embedding sequence, a textual response associated with the audio received from the user. The method may still even further include a step of causing to display, via a user interface of the user, the produced textual response.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification