Patent search ap:("Intel Corporation") AND inv:"Shuai Mu" Page 1

1.

发明授权
Native support for execution of get exponent, get mantisssa, and scale instructions within a graphics processing unit via reuse of fused multiply-add execution unit hardware logic 有权

公开(公告)号：US12067394B2

公开(公告)日：2024-08-20

申请号：US18170696

申请日：2023-02-17

Applicant: Intel Corporation

Inventor： Shuai Mu , Cristina S. Anderson , Subramaniam Maiyuran

IPC: G06F9/30 , G06F7/544 , G06T1/20

CPC classification number: G06F9/3001 , G06F7/5443 , G06T1/20

Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.

2.

发明授权
Tanh and sigmoid function execution 有权

公开(公告)号：US12164884B2

公开(公告)日：2024-12-10

申请号：US17003334

申请日：2020-08-26

Applicant: Intel Corporation

Inventor： Shuai Mu , Cristina S. Anderson , Subramaniam Maiyuran

IPC: G06F7/548 , G06F9/30 , G06F17/17

Abstract: Examples described herein relate to instructions to request performance of tanh and sigmoid instructions. For example, a compiler can generate native tanh instructions to perform tanh. In some examples, a tanh function can be compiled into instructions that include an instruction to perform either tanh(input) or tanh(input)/input depending on a value of the input to generate an intermediate output; an instruction to cause a performance of generation of scale factor based on the input; and an instruction to cause performance of a multiplication operation on the intermediate result with the scale factor. For example, a sigmoid function can be compiled to cause a math pipeline to perform a range check and performs operations based on a range.

3.

发明申请
NATIVE SUPPORT FOR EXECUTION OF GET EXPONENT, GET MANTISSSA, AND SCALE INSTRUCTIONS WITHIN A GRAPHICS PROCESSING UNIT VIA REUSE OF FUSED MULTIPLY-ADD EXECUTION UNIT HARDWARE LOGIC 有权

公开(公告)号：US20220405096A1

公开(公告)日：2022-12-22

申请号：US17353984

申请日：2021-06-22

Applicant: Intel Corporation

Inventor： Shuai Mu , Cristina S. Anderson , Subramaniam Maiyuran

IPC: G06F9/30 , G06F7/544 , G06T1/20

Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.

4.

发明申请
AVOIDING THE USE OF A RESULT CROSSBAR WHEN DOWN CONVERTING TO PACKED REGISTER FORMATS 有权

公开(公告)号：US20250036412A1

公开(公告)日：2025-01-30

申请号：US18358308

申请日：2023-07-25

Applicant: Intel Corporation

Inventor： Supratim Pal , Jiasheng Chen , Christopher Spencer , Jorge E. Parra Osorio , Kevin Hurd , Guei-Yuan Lueh , Pradeep K. Golconda , Fangwen Fu , Wei Xiong , Hongzheng Li , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Shuai Mu , Clifford Gibson , Buqi Cheng

IPC: G06F9/30 , G06F9/38

Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a plurality of processing resources. A processing resource of the plurality of processing resources includes a source crossbar communicatively coupled with a register file, the source crossbar to reorder data elements of a source operand and a format conversion pipeline to convert a plurality of input data elements specified by the source operand from a first format of a plurality of datatype formats to a second format of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.

5.

发明申请
FLOATING-POINT CONVERSION VIA AN INTEGER UNIT 有权

公开(公告)号：US20250036361A1

公开(公告)日：2025-01-30

申请号：US18358304

申请日：2023-07-25

Applicant: Intel Corporation

Inventor： Supratim Pal , Jiasheng Chen , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Guei-Yuan Lueh , Pradeep K. Golconda , Fangwen Fu , Wei Xiong , Hongzheng Li , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Shuai Mu , Clifford Gibson , Buqi Cheng

IPC: G06F7/483

Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a multi-lane parallel floating-point unit and a multi-lane parallel integer unit. The multi-lane parallel integer unit includes an integer pipeline including a plurality of parallel integer logic units configured to perform integer compute operations on a plurality of input data elements and a format conversion pipeline including a plurality of parallel format conversion units configured to convert a plurality of input data elements from a first one of a plurality of datatype formats to a second one of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.

6.

发明授权
Multiple register allocation sizes for threads 有权

公开(公告)号：US12210905B2

公开(公告)日：2025-01-28

申请号：US17358650

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Chandra Gurram , Wei-Yu Chen , Vikranth Vemulapalli , Subramaniam Maiyuran , Jorge Eduardo Parra Osorio , Shuai Mu , Guei-Yuan Lueh , Supratim Pal

IPC: G06F9/30 , G06F9/38 , G06F9/48 , G06F9/50 , G06T1/20

Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.

7.

发明公开
SUPPORTING VECTOR MULTIPLY ADD WITH DOUBLE ACCUMULATOR ACCESS IN A GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240103810A1

公开(公告)日：2024-03-28

申请号：US17935787

申请日：2022-09-27

Applicant: Intel Corporation

Inventor： Jiasheng Chen , Supratim Pal , Changwon Rhee , Hong Jiang , Kevin Hurd , Shuai Mu

IPC: G06F7/544 , G06F7/57 , G06F17/16

CPC classification number: G06F7/5443 , G06F7/57 , G06F17/16

Abstract: An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.

8.

发明公开
SUPPORTING 8-BIT FLOATING POINT FORMAT OPERANDS IN A COMPUTING ARCHITECTURE 审中-公开

公开(公告)号：US20240256274A1

公开(公告)日：2024-08-01

申请号：US18618648

申请日：2024-03-27

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Subramaniam Maiyuran , Varghese George , Fangwen Fu , Shuai Mu , Supratim Pal , Wei Xiong

IPC: G06F9/30 , G06F9/38 , G06F9/48 , G06F17/16 , G06N20/00

CPC classification number: G06F9/30014 , G06F9/3818 , G06F9/4843 , G06F17/16 , G06N20/00

Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

9.

发明申请
NATIVE SUPPORT FOR EXECUTION OF GET EXPONENT, GET MANTISSSA, AND SCALE INSTRUCTIONS WITHIN A GRAPHICS PROCESSING UNIT VIA REUSE OF FUSED MULTIPLY-ADD EXECUTION UNIT HARDWARE LOGIC 有权

公开(公告)号：US20240403044A1

公开(公告)日：2024-12-05

申请号：US18677140

申请日：2024-05-29

Applicant: Intel Corporation

Inventor： Shuai Mu , Cristina S. Anderson , Subramaniam Maiyuran

IPC: G06F9/30 , G06F7/544 , G06T1/20

Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.

10.

发明公开
NATIVE SUPPORT FOR EXECUTION OF GET EXPONENT, GET MANTISSSA, AND SCALE INSTRUCTIONS WITHIN A GRAPHICS PROCESSING UNIT VIA REUSE OF FUSED MULTIPLY-ADD EXECUTION UNIT HARDWARE LOGIC 审中-公开

公开(公告)号：US20230315447A1

公开(公告)日：2023-10-05

申请号：US18170696

申请日：2023-02-17

Applicant: Intel Corporation

Inventor： Shuai Mu , Cristina S. Anderson , Subramaniam Maiyuran

IPC: G06F9/30 , G06T1/20 , G06F7/544

CPC classification number: G06F9/3001 , G06T1/20 , G06F7/5443

Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification