Patent search ap:("Google LLC") AND inv:"James Lee Thorp" Page 1

1.

发明申请
Sparse Mixer Architecture 有权

公开(公告)号：US20240386256A1

公开(公告)日：2024-11-21

申请号：US18318049

申请日：2023-05-16

Applicant: Google LLC

Inventor： James Lee Thorp , Joshua Timothy Ainslie

IPC: G06N3/0499

Abstract: Improved multi-layer machine learning model architectures are provided that exhibit increased accuracy, decreased training time, decreased inference compute cost, and/or increased stability while training. These improved models include a plurality of sequential layers, each layer comprising a mixing layer that feeds into a feedforward layer. These improved models achieve these benefits by ‘enhancing’ a subset of the feedforward layers with mixture-of-experts or other sparse multi-network architectures while ‘degrading’ a subset of the mixing layers to be simple linear mixing layers (e.g., that multiply inputs by one or more mixing matrices) rather than more complicated attentional mixing mechanisms (e.g., including a number of matrix multiplications, dot products, and nonlinear operations). Such a combination of mixing layer modifications and feedforward layer modifications in a single multi-layer model exhibits synergistic improvements with respect to training time, inference computational cost, and training stability for a given level of model accuracy.

Patent Agency Ranking