-
公开(公告)号:US20210089304A1
公开(公告)日:2021-03-25
申请号:US16581252
申请日:2019-09-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Michael MANTOR , Jiasheng CHEN , Jian HUANG
Abstract: A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.
-
公开(公告)号:US20220171621A1
公开(公告)日:2022-06-02
申请号:US17574026
申请日:2022-01-12
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Jiasheng CHEN , Jian HUANG
Abstract: A graphics processing unit (GPU) sequences provision of operands to a set of operand registers, thereby allowing the GPU to share at least one of the operand registers between processing. The GPU includes a plurality of arithmetic logic units (ALUs) with at least one of the ALUs configured to perform double precision operations. The GPU further includes a set of operand registers configured to store single precision operands. For a plurality of executing threads that request double precision operations, the GPU stores the corresponding operands at the operand registers. Over a plurality of execution cycles, the GPU sequences transfer of operands from the set of operand registers to a designated double precision operand register. During each execution cycle, the double-precision ALU executes a double precision operation using the operand stored at the double precision operand register.
-
公开(公告)号:US20210157581A1
公开(公告)日:2021-05-27
申请号:US16696108
申请日:2019-11-26
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Jiasheng CHEN , Jian HUANG
Abstract: A graphics processing unit (GPU) sequences provision of operands to a set of operand registers, thereby allowing the GPU to share at least one of the operand registers between processing. The GPU includes a plurality of arithmetic logic units (ALUs) with at least one of the ALUs configured to perform double precision operations. The GPU further includes a set of operand registers configured to store single precision operands. For a plurality of executing threads that request double precision operations, the GPU stores the corresponding operands at the operand registers. Over a plurality of execution cycles, the GPU sequences transfer of operands from the set of operand registers to a designated double precision operand register. During each execution cycle, the double-precision ALU executes a double precision operation using the operand stored at the double precision operand register.
-
公开(公告)号:US20240111530A1
公开(公告)日:2024-04-04
申请号:US18243264
申请日:2023-09-07
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Michael MANTOR , Jiasheng CHEN , Jian HUANG
CPC classification number: G06F9/30036 , G06F9/30101 , G06F9/3877 , G06F9/544 , G06F17/16
Abstract: A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.
-
公开(公告)号:US20210157588A1
公开(公告)日:2021-05-27
申请号:US16697660
申请日:2019-11-27
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jiasheng CHEN , Bin HE , Jian HUANG , Michael MANTOR
Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.
-
-
-
-