专利检索 ap:("ADVANCED MICRO DEVICES, INC.") AND inv:"Jiasheng Chen" 第 1 页

1.

发明授权
Stream processor with low power parallel matrix multiply pipeline 有权

公开(公告)号：US12067401B2

公开(公告)日：2024-08-20

申请号：US15855637

申请日：2017-12-27

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Yunxiao Zou , Michael J. Mantor , Allen Rush

IPC分类号： G06F9/38 , G06F7/544 , G06F9/30 , G06F17/16

CPC分类号： G06F9/3867 , G06F7/5443 , G06F9/3001 , G06F9/30036 , G06F9/30101 , G06F17/16

摘要： Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.

2.

发明授权
Packed 16 bits instruction pipeline 有权

公开(公告)号：US11880683B2

公开(公告)日：2024-01-23

申请号：US15799560

申请日：2017-10-31

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Bin He , Yunxiao Zou , Michael J. Mantor , Radhakrishna Giduthuri , Eric J. Finger , Brian D. Emberling

IPC分类号： G06F9/30 , G06F7/483 , G06F7/57

CPC分类号： G06F9/30014 , G06F7/483 , G06F7/57 , G06F9/30036 , G06F9/30112 , G06F2207/3812 , G06F2207/3828

摘要： Systems, apparatuses, and methods for efficiently processing arithmetic operations are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.

3.

发明申请
LOW POWER AND LOW LATENCY GPU COPROCESSOR FOR PERSISTENT COMPUTING 有权

公开(公告)号：US20210201439A1

公开(公告)日：2021-07-01

申请号：US17181300

申请日：2021-02-22

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Timour Paltashev , Alexander Lyashevsky , Carl Kittredge Wakeland , Michael J. Mantor

IPC分类号： G06T1/20 , G06F9/54 , G06F9/38 , G06T1/60

摘要： Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

4.

发明申请
STREAM PROCESSOR WITH HIGH BANDWIDTH AND LOW POWER VECTOR REGISTER FILE 审中-公开

公开(公告)号：US20180357064A1

公开(公告)日：2018-12-13

申请号：US15644045

申请日：2017-07-07

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Bin He , Mark M. Leather , Michael J. Mantor , Yunxiao Zou

IPC分类号： G06F9/38 , G06F9/30 , G06F12/0875 , G06F12/0891

CPC分类号： G06F9/3867 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/3012 , G06F9/30141 , G06F9/3802 , G06F9/3826 , G06F9/383 , G06F9/3832 , G06F9/3857 , G06F12/0804 , G06F12/0855 , G06F12/0875 , G06F12/0891 , G06F12/121 , G06F2212/1008 , G06F2212/1024 , G06F2212/452

摘要： Systems, apparatuses, and methods for implementing a high bandwidth, low power vector register file for use by a parallel processor are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of processing pipeline. The parallel processing unit includes a vector arithmetic logic unit and a high bandwidth, low power, vector register file. The vector register file includes multi-bank high density random-access memories (RAMs) to satisfy register bandwidth requirements. The parallel processing unit also includes an instruction request queue and an instruction operand buffer to provide enough local bandwidth for VALU instructions and vector I/O instructions. Also, the parallel processing unit is configured to leverage the RAM's output flops as a last level cache to reduce duplicate operand requests between multiple instructions. The parallel processing unit includes a vector destination cache to provide additional R/W bandwidth for the vector register file.

5.

发明授权
Pairing SIMD lanes to perform double precision operations 有权

公开(公告)号：US11409536B2

公开(公告)日：2022-08-09

申请号：US15342809

申请日：2016-11-03

申请人： Advanced Micro Devices, Inc.

发明人： Bin He , YunXiao Zou , Jiasheng Chen , Michael Mantor

IPC分类号： G06F9/38 , G06F9/30

摘要： A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.

6.

发明授权
Hybrid matrix multiplication pipeline 有权

公开(公告)号：US11347827B2

公开(公告)日：2022-05-31

申请号：US16287013

申请日：2019-02-27

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Qingcheng Wang , Yunxiao Zou

IPC分类号： G06F17/16 , G06N20/00 , G06F5/01

摘要： Systems, apparatuses, and methods implementing a hybrid matrix multiplication pipeline are disclosed. A hybrid matrix multiplication pipeline is able to execute a plurality of different types of instructions in a plurality of different formats by reusing execution circuitry in an efficient manner. For a first type of instruction for source operand elements of a first size, the pipeline uses N multipliers to perform N multiplication operations on N different sets of operands, where N is a positive integer greater than one. For a second type of instruction for source operand elements of a second size, the N multipliers work in combination to perform a single multiplication operation on a single set of operands, where the second size is greater than the first size. The pipeline also shifts element product results in an efficient manner when implementing a dot product operation.

7.

发明申请
EXTREME-BANDWIDTH SCALABLE PERFORMANCE-PER-WATT GPU ARCHITECTURE 审中-公开

公开(公告)号：US20190196742A1

公开(公告)日：2019-06-27

申请号：US15851476

申请日：2017-12-21

申请人： Advanced Micro Devices, Inc.

发明人： Dmitri Yudanov , Jiasheng Chen

IPC分类号： G06F3/06

CPC分类号： G06F3/0659 , G06F3/0604 , G06F3/0679 , G06F9/3887 , G06T1/20 , G06T1/60

摘要： A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.

8.

发明申请
STREAM PROCESSOR WITH OVERLAPPING EXECUTION 审中-公开

公开(公告)号：US20190004807A1

公开(公告)日：2019-01-03

申请号：US15657478

申请日：2017-07-24

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Qingcheng Wang , Yunxiao Zou , Bin He , Jian Yang , Michael J. Mantor , Brian D. Emberling

IPC分类号： G06F9/38

摘要： Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.

9.

发明授权
Arithmetic logic unit register sequencing 有权

公开(公告)号：US11789732B2

公开(公告)日：2023-10-17

申请号：US17574026

申请日：2022-01-12

申请人： ADVANCED MICRO DEVICES, INC.

发明人： Bin He , Jiasheng Chen , Jian Huang

IPC分类号： G06F12/02 , G06F9/30 , G06F7/57 , G06F9/48

CPC分类号： G06F9/3001 , G06F7/57 , G06F9/3009 , G06F9/30101 , G06F9/4806

摘要： A graphics processing unit (GPU) sequences provision of operands to a set of operand registers, thereby allowing the GPU to share at least one of the operand registers between processing. The GPU includes a plurality of arithmetic logic units (ALUs) with at least one of the ALUs configured to perform double precision operations. The GPU further includes a set of operand registers configured to store single precision operands. For a plurality of executing threads that request double precision operations, the GPU stores the corresponding operands at the operand registers. Over a plurality of execution cycles, the GPU sequences transfer of operands from the set of operand registers to a designated double precision operand register. During each execution cycle, the double-precision ALU executes a double precision operation using the operand stored at the double precision operand register.

10.

发明授权
Low power and low latency GPU coprocessor for persistent computing 有权

公开(公告)号：US10929944B2

公开(公告)日：2021-02-23

申请号：US15360057

申请日：2016-11-23

申请人： Advanced Micro Devices, Inc.

发明人： Jiasheng Chen , Timour Paltashev , Alexander Lyashevsky , Carl Kittredge Wakeland , Michael J. Mantor

IPC分类号： G06T1/20 , G06F9/54 , G06F9/38 , G06T1/60

摘要： Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类