-
公开(公告)号:US11238332B2
公开(公告)日:2022-02-01
申请号:US17341193
申请日:2021-06-07
Applicant: Google LLC
Inventor: Joshua Timothy Ainslie , Santiago Ontañón , Philip Pham , Manzil Zaheer , Guru Guruganesh , Kumar Avinava Dubey , Amr Ahmed
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.
-
公开(公告)号:US20220156553A1
公开(公告)日:2022-05-19
申请号:US17589542
申请日:2022-01-31
Applicant: Google LLC
Inventor: Joshua Timothy Ainslie , Santiago Ontañón , Philip Pham , Manzil Zaheer , Guru Guruganesh , Kumar Avinava Dubey , Amr Ahmed
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.
-
公开(公告)号:US20210383191A1
公开(公告)日:2021-12-09
申请号:US17341193
申请日:2021-06-07
Applicant: Google LLC
Inventor: Joshua Timothy Ainslie , Santiago Ontañón , Philip Pham , Manzil Zaheer , Guru Guruganesh , Kumar Avinava Dubey , Amr Ahmed
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.
-
-