-
公开(公告)号:US20240386256A1
公开(公告)日:2024-11-21
申请号:US18318049
申请日:2023-05-16
Applicant: Google LLC
Inventor: James Lee Thorp , Joshua Timothy Ainslie
IPC: G06N3/0499
Abstract: Improved multi-layer machine learning model architectures are provided that exhibit increased accuracy, decreased training time, decreased inference compute cost, and/or increased stability while training. These improved models include a plurality of sequential layers, each layer comprising a mixing layer that feeds into a feedforward layer. These improved models achieve these benefits by ‘enhancing’ a subset of the feedforward layers with mixture-of-experts or other sparse multi-network architectures while ‘degrading’ a subset of the mixing layers to be simple linear mixing layers (e.g., that multiply inputs by one or more mixing matrices) rather than more complicated attentional mixing mechanisms (e.g., including a number of matrix multiplications, dot products, and nonlinear operations). Such a combination of mixing layer modifications and feedforward layer modifications in a single multi-layer model exhibits synergistic improvements with respect to training time, inference computational cost, and training stability for a given level of model accuracy.