专利检索 ap:("Intel Corporation") AND inv:"Jiasheng Chen" 第 1 页

1.

发明公开
ARCHITECTURE FOR BLOCK SPARSE OPERATIONS ON A SYSTOLIC ARRAY 审中-公开

公开(公告)号：US20240161227A1

公开(公告)日：2024-05-16

申请号：US18532245

申请日：2023-12-07

申请人： Intel Corporation

发明人： Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray

IPC分类号： G06T1/20 , G06F7/544 , G06F9/50 , G06F12/0806 , G06F15/80 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084

CPC分类号： G06T1/20 , G06F7/5443 , G06F9/5027 , G06F12/0806 , G06F15/8046 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084

摘要： Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.

2.

发明公开
HARDWARE ENHANCEMENTS FOR DOUBLE PRECISION SYSTOLIC SUPPORT 审中-公开

公开(公告)号：US20240111826A1

公开(公告)日：2024-04-04

申请号：US17937252

申请日：2022-09-30

申请人： Intel Corporation

发明人： Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang

IPC分类号： G06F17/16 , G06F7/544 , G06T1/20

CPC分类号： G06F17/16 , G06F7/5443 , G06T1/20

摘要： An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.

3.

发明申请
IMMEDIATE OFFSET OF LOAD STORE AND ATOMIC INSTRUCTIONS 有权

公开(公告)号：US20230090973A1

公开(公告)日：2023-03-23

申请号：US17480528

申请日：2021-09-21

申请人： Intel Corporation

发明人： Joydeep Ray , Abhishek R. Appu , Timothy R. Bauer , James Valerio , Weiyu Chen , Subramaniam Maiyuran , Prasoonkumar Surti , Karthik Vaidyanathan , Carsten Benthin , Sven Woop , Jiasheng Chen

IPC分类号： G06F9/30 , G06F12/02 , G06F13/16

摘要： One embodiment provides a graphics processor including a processing resource including a register file, memory, a cache memory, and load/store/cache circuitry to process load, store, and prefetch messages from the processing resource. The circuitry includes support for an immediate address offset that will be used to adjust the address supplied for a memory access to be requested by the circuitry. Including support for the immediate address offset removes the need to execute additional instructions to adjust the address to be accessed prior to execution of the memory access instruction.

4.

发明申请
DUAL PIPELINE PARALLEL SYSTOLIC ARRAY 有权

公开(公告)号：US20220414054A1

公开(公告)日：2022-12-29

申请号：US17304797

申请日：2021-06-25

申请人： Intel Corporation

发明人： Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi

IPC分类号： G06F15/80 , G06F9/38

摘要： A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

5.

发明公开
ENHANCEMENTS FOR ACCUMULATOR USAGE AND INSTRUCTION FORWARDING IN MATRIX MULTIPLY PIPELINE IN GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240169021A1

公开(公告)日：2024-05-23

申请号：US18056930

申请日：2022-11-18

申请人： Intel Corporation

发明人： Jorge Eduardo Parra Osorio , Supratim Pal , Fangwen Fu , Guei-Yuan Lueh , Po-Yu Chen , Jiasheng Chen

IPC分类号： G06F17/16 , G06F7/544

CPC分类号： G06F17/16 , G06F7/5443

摘要： An apparatus to facilitate enhancements for accumulator usage and instruction forwarding in matrix multiply pipeline in graphics environment is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units comprise: multiply-accumulate hardware to generate intermediate results of a matrix multiplication operation; intermediate accumulation hardware to store the intermediate results of the matrix multiplication operation and accumulate with other intermediate results generated by the multiply-accumulate hardware; a bypass data structure to cause a source operand to bypass the multiply-accumulate hardware; and an adder circuit to add an output from the multiply-accumulate hardware with at least one of the source operand or an output of the intermediate accumulation hardware to generate a final output.

6.

发明公开
SUPPORTING AND LOAD BALANCING MULTIPLE DOUBLE PRECISION PIPELINES IN A GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240168764A1

公开(公告)日：2024-05-23

申请号：US18056820

申请日：2022-11-18

申请人： Intel Corporation

发明人： Supratim Pal , Jiasheng Chen , Vikranth Vemulapalli , Subramaniam Maiyuran

IPC分类号： G06F9/30 , G06F9/38

CPC分类号： G06F9/30014 , G06F9/3867

摘要： An apparatus to facilitate supporting and load balancing multiple double precision pipelines in a graphics environment is disclosed. The apparatus includes a processing core having at least one processing resource comprising: a first double precision (DP) pipeline to support double float operations, the first DP pipeline comprising a first set of floating point units (FPUs) configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete; and a second DP pipeline to support the double float operations, wherein the second DP pipeline comprising a second set of FPUs configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete.

7.

发明公开
MATRIX TRANSPOSITION IN MATRIX MULTIPLICATION ARRAY CIRCUITRY 审中-公开

公开(公告)号：US20240168723A1

公开(公告)日：2024-05-23

申请号：US18056822

申请日：2022-11-18

申请人： Intel Corporation

发明人： Jorge Eduardo Parra Osorio , Supratim Pal , Jiasheng Chen

IPC分类号： G06F7/78 , G06F17/16

CPC分类号： G06F7/78 , G06F17/16

摘要： An apparatus to facilitate matrix transposition in matrix multiplication array circuitry is disclosed. The apparatus includes a processor comprising matrix acceleration hardware comprising storage buffers and an array of data processing units (DPUs), wherein the matrix acceleration hardware is to: load data for a source matrix to the storage buffers; generate a transposed matrix corresponding comprising transposed elements of the source matrix; and input the transposed matrix to the array of DPUs for a matrix multiplication operation.

8.

发明申请
RANDOM SPARSITY HANDLING IN A SYSTOLIC ARRAY 有权

公开(公告)号：US20220309124A1

公开(公告)日：2022-09-29

申请号：US17211627

申请日：2021-03-24

申请人： Intel Corporation

发明人： Chunhui Mei , Hong Jiang , Jiasheng Chen , Yongsheng Liu , Yan Li

IPC分类号： G06F17/16 , G06F17/11 , G06F15/80 , G06F7/544 , G06F9/30

摘要： Matrix multiply units can take advantage of input sparsity by zero gating ALUs, which saves power consumption, but compute throughput does not increase. To improve compute throughput from sparsity, processing resources in a matrix accelerator can skip computation with zero involved in input or output. If zeros in input can be skipped, the processing units can focus calculations on generating meaningful non-zero output.

9.

发明申请
ARCHITECTURE FOR BLOCK SPARSE OPERATIONS ON A SYSTOLIC ARRAY 有权

公开(公告)号：US20210103550A1

公开(公告)日：2021-04-08

申请号：US17122905

申请日：2020-12-15

申请人： Intel Corporation

发明人： Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray

IPC分类号： G06F15/80 , G06F7/544 , G06F9/50 , G06F17/16 , G06N3/08 , G06N3/04

摘要： Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.

10.

发明公开
INCREASING PROCESSING RESOURCES IN PROCESSING CORES OF A GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240160478A1

公开(公告)日：2024-05-16

申请号：US17987185

申请日：2022-11-15

申请人： Intel Corporation

发明人： Jiasheng Chen , Chunhui Mei , Ben J. Ashbaugh , Naveen Matam , Joydeep Ray , Timothy Bauer , Guei-Yuan Lueh , Vasanth Ranganathan , Prashant Chaudhari , Vikranth Vemulapalli , Nishanth Reddy Pendluru , Piotr Reiter , Jain Philip , Marek Rudniewski , Christopher Spencer , Parth Damani , Prathamesh Raghunath Shinde , John Wiegert , Fataneh Ghodrat

IPC分类号： G06F9/50 , G06F12/0875

CPC分类号： G06F9/5016 , G06F12/0875 , G06F2212/452

摘要： An apparatus to facilitate increasing processing resources in processing cores of a graphics environment is disclosed. The apparatus includes a plurality of processing resources to execute one or more execution threads; a plurality of message arbiter-processing resource (MA-PR) routers, wherein a respective MA-PR router of the plurality of MA-PR routers corresponds to a pair of processing resources of the plurality of processing resources and is to arbitrate routing of a thread control message from a message arbiter between the pair of processing resources; a plurality of local shared cache (LSC) sequencers to provide an interface between at least one LSC of the processing core and the plurality of processing resources; and a plurality of instruction caches (ICs) to store instructions of the one or more execution threads, wherein a respective IC of the plurality of ICs interfaces with a portion of the plurality of processing resources.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类