Patent search ap:("ADVANCED MICRO DEVICES Page INC." OR "ATI TECHNOLOGIES ULC") AND inv:"Michael MANTOR"

11.

发明申请
MATRIX MULTIPLICATION UNIT WITH FLEXIBLE PRECISION OPERATIONS 有权

公开(公告)号：US20210089304A1

公开(公告)日：2021-03-25

申请号：US16581252

申请日：2019-09-24

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin HE , Michael MANTOR , Jiasheng CHEN , Jian HUANG

IPC: G06F9/30 , G06F17/16 , G06F9/54 , G06F9/38

Abstract: A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.

12.

发明申请
PIPELINE INCLUDING SEPARATE HARDWARE DATA PATHS FOR DIFFERENT INSTRUCTION TYPES 审中-公开

公开(公告)号：US20200293329A1

公开(公告)日：2020-09-17

申请号：US16860842

申请日：2020-04-28

Applicant: ADVANCED MICRO DEVICES, INC. , ADVANCED MICRO DEVICES (SHANGHAI) CO., LTD.

Inventor： Jiasheng CHEN , YunXiao ZOU , Bin HE , Angel E. SOCARRAS , QingCheng WANG , Wei YUAN , Michael MANTOR

IPC: G06F9/38 , G06F9/30 , G06F15/80

Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.

13.

发明申请
PROCESSING UNIT WITH MIXED PRECISION OPERATIONS 审中-公开

公开(公告)号：US20200293286A1

公开(公告)日：2020-09-17

申请号：US16591031

申请日：2019-10-02

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin HE , Michael MANTOR , Jiasheng CHEN

IPC: G06F7/57 , G06F7/544 , G06F7/483 , G06F9/38

Abstract: A graphics processing unit (GPU) implements operations, with associated op codes, to perform mixed precision mathematical operations. The GPU includes an arithmetic logic unit (ALU) with different execution paths, wherein each execution path executes a different mixed precision operation. By implementing mixed precision operations at the ALU in response to designate op codes that delineate the operations, the GPU efficiently increases the precision of specified mathematical operations while reducing execution overhead.

14.

发明申请
BIN STREAMOUT PREEMPTION IN A GRAPHICS PROCESSING PIPELINE 审中-公开

公开(公告)号：US20190005604A1

公开(公告)日：2019-01-03

申请号：US15639980

申请日：2017-06-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Anirudh R. ACHARYA , Michael MANTOR , Vineet GOEL , Swapnil SAKHARSHETE

IPC: G06T1/20 , G06T15/00 , G06T17/10

Abstract: A stage of a graphics pipeline in a graphics processing unit (GPU) detects an interrupt concurrently with the stage processing primitives in a first bin that represents a first portion of a first frame generated by a first application. The stage forwards a completed portion of the primitives to a subsequent stage of the graphics pipeline in response to the interrupt. The stage diverts a second bin that represents a second portion of the first frame from the stage to a memory in response to the interrupt. The stage processes primitives in a third bin that represents a portion of a second frame generated by a second application subsequent to diverting the second bin to the memory. The stage can then retrieve the second bin from the memory in response to the stage completing processing of the primitives in the third bin for additional processing.

15.

发明申请
DEDICATED VECTOR SUB-PROCESSOR SYSTEM 有权

公开(公告)号：US20210157588A1

公开(公告)日：2021-05-27

申请号：US16697660

申请日：2019-11-27

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Jiasheng CHEN , Bin HE , Jian HUANG , Michael MANTOR

IPC: G06F9/30 , G06F9/48

Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.

16.

发明申请
PRIMITIVE LEVEL PREEMPTION USING DISCRETE NON-REAL-TIME AND REAL TIME PIPELINES 审中-公开

公开(公告)号：US20190164328A1

公开(公告)日：2019-05-30

申请号：US16238727

申请日：2019-01-03

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Anirudh R. ACHARYA , Swapnil SAKHARSHETE , Michael MANTOR , Mangesh P. NIJASURE , Todd MARTIN , Vineet GOEL

IPC: G06T15/00 , G06T1/20 , G06F9/48 , G06T15/80 , G06F9/50

Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.

17.

发明申请
PIPELINE INCLUDING SEPARATE HARDWARE DATA PATHS FOR DIFFERENT INSTRUCTION TYPES 审中-公开

公开(公告)号：US20180113714A1

公开(公告)日：2018-04-26

申请号：US15789318

申请日：2017-10-20

Applicant: Advanced Micro Devices, Inc. , Advanced Micro Devices (Shanghai) Co., Ltd.

Inventor： Jiasheng CHEN , YunXiao ZOU , Bin HE , Angel E. SOCARRAS , QingCheng WANG , Wei YUAN , Michael MANTOR

IPC: G06F9/38 , G06F15/80 , G06F9/30

CPC classification number: G06F9/3851 , G06F9/30014 , G06F9/3013 , G06F9/3814 , G06F9/3836 , G06F9/3869 , G06F9/3885 , G06F15/80 , G06F2015/768

Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.

18.

发明公开
MATRIX MULTIPLICATION UNIT WITH FLEXIBLE PRECISION OPERATIONS 审中-公开

公开(公告)号：US20240111530A1

公开(公告)日：2024-04-04

申请号：US18243264

申请日：2023-09-07

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin HE , Michael MANTOR , Jiasheng CHEN , Jian HUANG

IPC: G06F9/30 , G06F9/38 , G06F9/54 , G06F17/16

CPC classification number: G06F9/30036 , G06F9/30101 , G06F9/3877 , G06F9/544 , G06F17/16

Abstract: A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.

19.

发明申请
VERTICAL AND HORIZONTAL BROADCAST OF SHARED OPERANDS 有权

公开(公告)号：US20220100528A1

公开(公告)日：2022-03-31

申请号：US17032307

申请日：2020-09-25

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Sateesh LAGUDU , Allen H. RUSH , Michael MANTOR , Arun Vaidyanathan ANANTHANARAYAN , Prasad NAGABHUSHANAMGARI , Maxim V. KAZAKOV

IPC: G06F9/38 , G06F13/40 , G06F13/28

Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.

20.

发明申请
PROCESSING UNIT WITH SMALL FOOTPRINT ARITHMETIC LOGIC UNIT 有权

公开(公告)号：US20210405968A1

公开(公告)日：2021-12-30

申请号：US17029836

申请日：2020-09-23

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin HE , Shubh SHAH , Michael MANTOR

IPC: G06F7/57 , G06F17/16 , G06N3/08

Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification