-
公开(公告)号:US20240161227A1
公开(公告)日:2024-05-16
申请号:US18532245
申请日:2023-12-07
申请人: Intel Corporation
发明人: Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray
IPC分类号: G06T1/20 , G06F7/544 , G06F9/50 , G06F12/0806 , G06F15/80 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084
CPC分类号: G06T1/20 , G06F7/5443 , G06F9/5027 , G06F12/0806 , G06F15/8046 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084
摘要: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
-
公开(公告)号:US20240111826A1
公开(公告)日:2024-04-04
申请号:US17937252
申请日:2022-09-30
申请人: Intel Corporation
发明人: Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang
CPC分类号: G06F17/16 , G06F7/5443 , G06T1/20
摘要: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
-
公开(公告)号:US20230090973A1
公开(公告)日:2023-03-23
申请号:US17480528
申请日:2021-09-21
申请人: Intel Corporation
发明人: Joydeep Ray , Abhishek R. Appu , Timothy R. Bauer , James Valerio , Weiyu Chen , Subramaniam Maiyuran , Prasoonkumar Surti , Karthik Vaidyanathan , Carsten Benthin , Sven Woop , Jiasheng Chen
摘要: One embodiment provides a graphics processor including a processing resource including a register file, memory, a cache memory, and load/store/cache circuitry to process load, store, and prefetch messages from the processing resource. The circuitry includes support for an immediate address offset that will be used to adjust the address supplied for a memory access to be requested by the circuitry. Including support for the immediate address offset removes the need to execute additional instructions to adjust the address to be accessed prior to execution of the memory access instruction.
-
公开(公告)号:US20220414054A1
公开(公告)日:2022-12-29
申请号:US17304797
申请日:2021-06-25
申请人: Intel Corporation
发明人: Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi
摘要: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.
-
5.
公开(公告)号:US20240169021A1
公开(公告)日:2024-05-23
申请号:US18056930
申请日:2022-11-18
申请人: Intel Corporation
发明人: Jorge Eduardo Parra Osorio , Supratim Pal , Fangwen Fu , Guei-Yuan Lueh , Po-Yu Chen , Jiasheng Chen
CPC分类号: G06F17/16 , G06F7/5443
摘要: An apparatus to facilitate enhancements for accumulator usage and instruction forwarding in matrix multiply pipeline in graphics environment is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units comprise: multiply-accumulate hardware to generate intermediate results of a matrix multiplication operation; intermediate accumulation hardware to store the intermediate results of the matrix multiplication operation and accumulate with other intermediate results generated by the multiply-accumulate hardware; a bypass data structure to cause a source operand to bypass the multiply-accumulate hardware; and an adder circuit to add an output from the multiply-accumulate hardware with at least one of the source operand or an output of the intermediate accumulation hardware to generate a final output.
-
6.
公开(公告)号:US20240168764A1
公开(公告)日:2024-05-23
申请号:US18056820
申请日:2022-11-18
申请人: Intel Corporation
CPC分类号: G06F9/30014 , G06F9/3867
摘要: An apparatus to facilitate supporting and load balancing multiple double precision pipelines in a graphics environment is disclosed. The apparatus includes a processing core having at least one processing resource comprising: a first double precision (DP) pipeline to support double float operations, the first DP pipeline comprising a first set of floating point units (FPUs) configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete; and a second DP pipeline to support the double float operations, wherein the second DP pipeline comprising a second set of FPUs configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete.
-
公开(公告)号:US20240168723A1
公开(公告)日:2024-05-23
申请号:US18056822
申请日:2022-11-18
申请人: Intel Corporation
摘要: An apparatus to facilitate matrix transposition in matrix multiplication array circuitry is disclosed. The apparatus includes a processor comprising matrix acceleration hardware comprising storage buffers and an array of data processing units (DPUs), wherein the matrix acceleration hardware is to: load data for a source matrix to the storage buffers; generate a transposed matrix corresponding comprising transposed elements of the source matrix; and input the transposed matrix to the array of DPUs for a matrix multiplication operation.
-
公开(公告)号:US20220309124A1
公开(公告)日:2022-09-29
申请号:US17211627
申请日:2021-03-24
申请人: Intel Corporation
发明人: Chunhui Mei , Hong Jiang , Jiasheng Chen , Yongsheng Liu , Yan Li
摘要: Matrix multiply units can take advantage of input sparsity by zero gating ALUs, which saves power consumption, but compute throughput does not increase. To improve compute throughput from sparsity, processing resources in a matrix accelerator can skip computation with zero involved in input or output. If zeros in input can be skipped, the processing units can focus calculations on generating meaningful non-zero output.
-
公开(公告)号:US20210103550A1
公开(公告)日:2021-04-08
申请号:US17122905
申请日:2020-12-15
申请人: Intel Corporation
发明人: Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray
摘要: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
-
公开(公告)号:US20240160478A1
公开(公告)日:2024-05-16
申请号:US17987185
申请日:2022-11-15
申请人: Intel Corporation
发明人: Jiasheng Chen , Chunhui Mei , Ben J. Ashbaugh , Naveen Matam , Joydeep Ray , Timothy Bauer , Guei-Yuan Lueh , Vasanth Ranganathan , Prashant Chaudhari , Vikranth Vemulapalli , Nishanth Reddy Pendluru , Piotr Reiter , Jain Philip , Marek Rudniewski , Christopher Spencer , Parth Damani , Prathamesh Raghunath Shinde , John Wiegert , Fataneh Ghodrat
IPC分类号: G06F9/50 , G06F12/0875
CPC分类号: G06F9/5016 , G06F12/0875 , G06F2212/452
摘要: An apparatus to facilitate increasing processing resources in processing cores of a graphics environment is disclosed. The apparatus includes a plurality of processing resources to execute one or more execution threads; a plurality of message arbiter-processing resource (MA-PR) routers, wherein a respective MA-PR router of the plurality of MA-PR routers corresponds to a pair of processing resources of the plurality of processing resources and is to arbitrate routing of a thread control message from a message arbiter between the pair of processing resources; a plurality of local shared cache (LSC) sequencers to provide an interface between at least one LSC of the processing core and the plurality of processing resources; and a plurality of instruction caches (ICs) to store instructions of the one or more execution threads, wherein a respective IC of the plurality of ICs interfaces with a portion of the plurality of processing resources.
-
-
-
-
-
-
-
-
-