-
公开(公告)号:US20240111826A1
公开(公告)日:2024-04-04
申请号:US17937252
申请日:2022-09-30
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang
CPC classification number: G06F17/16 , G06F7/5443 , G06T1/20
Abstract: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
-
公开(公告)号:US20220416999A1
公开(公告)日:2022-12-29
申请号:US17358897
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Wajdi Feghali , Changwon Rhee , Wei-Yu Chen , Timothy R. Bauer , Alexander Lyashevsky
Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.
-
公开(公告)号:US12066946B2
公开(公告)日:2024-08-20
申请号:US17704340
申请日:2022-03-25
Applicant: Intel Corporation
Inventor: Xiaodong Qiu , Yong Jiang , Changwon Rhee , Cui Tang , Shuangpeng Zhou , Lei Chen , Danyu Bi , Peiqing Jiang , Chengxi Wu
IPC: G06F12/084 , G06F9/48
CPC classification number: G06F12/084 , G06F9/4818 , G06F2212/604
Abstract: Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.
-
4.
公开(公告)号:US20240103810A1
公开(公告)日:2024-03-28
申请号:US17935787
申请日:2022-09-27
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Supratim Pal , Changwon Rhee , Hong Jiang , Kevin Hurd , Shuai Mu
CPC classification number: G06F7/5443 , G06F7/57 , G06F17/16
Abstract: An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.
-
公开(公告)号:US12236238B2
公开(公告)日:2025-02-25
申请号:US17358867
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Li-An Tang , Changwon Rhee , Timothy R. Bauer , Alexander Lyashevsky , Jiasheng Chen
Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.
-
公开(公告)号:US20240370372A1
公开(公告)日:2024-11-07
申请号:US18763009
申请日:2024-07-03
Applicant: Intel Corporation
Inventor: Xiaodong Qiu , Yong Jiang , Changwon Rhee , Cui Tang , Shuangpeng Zhou , Lei Chen , Danyu Bi , Peiqing Jiang , Chengxi Wu
IPC: G06F12/084 , G06F9/48
Abstract: Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.
-
公开(公告)号:US20240232088A9
公开(公告)日:2024-07-11
申请号:US17973203
申请日:2022-10-25
Applicant: Intel Corporation
Inventor: John A. Wiegert , Joydeep Ray , Vasanth Ranganathan , Biju George , Fangwen Fu , Abhishek R. Appu , Chunhui Mei , Changwon Rhee
IPC: G06F12/0855
CPC classification number: G06F12/0857 , G06F2212/1016
Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
-
公开(公告)号:US20240134797A1
公开(公告)日:2024-04-25
申请号:US17973203
申请日:2022-10-24
Applicant: Intel Corporation
Inventor: John A. Wiegert , Joydeep Ray , Vasanth Ranganathan , Biju George , Fangwen Fu , Abhishek R. Appu , Chunhui Mei , Changwon Rhee
IPC: G06F12/0855
CPC classification number: G06F12/0857 , G06F2212/1016
Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
-
公开(公告)号:US20240111825A1
公开(公告)日:2024-04-04
申请号:US17937229
申请日:2022-09-30
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Changwon Rhee , Kevin Hurd , Gregory Henry , Peter Caday , Kristopher Wong
Abstract: An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.
-
公开(公告)号:US20230086275A1
公开(公告)日:2023-03-23
申请号:US17482166
申请日:2021-09-22
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Changwon Rhee , Sabareesh Ganapathy , Gregory Henry , Fangwen Fu
Abstract: Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.
-
-
-
-
-
-
-
-
-