Patent search ap:("Intel Corporation") AND inv:"Xiaoming Chen" Page 4

31.

发明公开
DYNAMIC PRECISION FOR NEURAL NETWORK COMPUTE OPERATIONS 审中-公开

公开(公告)号：US20240005136A1

公开(公告)日：2024-01-04

申请号：US18351124

申请日：2023-07-12

Applicant: Intel Corporation

Inventor： Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045

CPC classification number: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30014 , G06T15/005 , G06F15/78 , G06F15/76 , G06F9/30036 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045 , G06T1/60

Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.

32.

发明公开
COMPUTE OPTIMIZATIONS FOR NEURAL NETWORKS 审中-公开

公开(公告)号：US20230359461A1

公开(公告)日：2023-11-09

申请号：US18315625

申请日：2023-05-11

Applicant: Intel Corporation

Inventor： Kevin Nealis , Anbang Yao , Xiaoming Chen , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha

IPC: G06F9/30 , G06F9/38 , G06N3/063 , G06N3/084 , G06T1/20 , G06N3/044 , G06N3/045

CPC classification number: G06F9/3001 , G06F9/3851 , G06F9/3887 , G06F9/3893 , G06N3/063 , G06N3/084 , G06T1/20 , G06N3/044 , G06N3/045 , G06F2207/4824

Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a one-bit weight associated with a neural network, as well as an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a fused operation including an exclusive not OR (XNOR) operation and a population count operation. The adder is configured to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.

33.

发明公开
MULTICORE PROCESSOR WITH EACH CORE HAVING INDEPENDENT FLOATING POINT DATAPATH AND INTEGER DATAPATH 审中-公开

公开(公告)号：US20230315481A1

公开(公告)日：2023-10-05

申请号：US18312079

申请日：2023-05-04

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma

IPC: G06F9/38 , G06F9/30 , G06F13/42 , G06F13/40 , G06N20/00 , G06T1/20 , G06N3/063 , G06N3/084 , G06N20/10 , G06N3/044 , G06N3/045 , G06F9/50 , G06F15/80 , G06N3/00

CPC classification number: G06F9/3887 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/30094 , G06F9/30109 , G06F9/30112 , G06F9/3016 , G06F9/3851 , G06F9/3891 , G06F9/50 , G06F13/4068 , G06F13/4282 , G06F15/80 , G06N3/00 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N20/00 , G06N20/10 , G06T1/20 , G06F2213/0026

Abstract: Described herein is a general-purpose graphics processing unit including a multiprocessor having a single instruction, multiple thread, SIMT, architecture. The multiprocessor comprises multiple sets of compute units each having a first logic unit configured to perform floating-point operations and a second logic unit configured to perform integer operations, with a thread of the floating-point instruction being executed in parallel with a thread of the integer instruction.

34.

发明授权
Convolutional neural network optimization mechanism 有权

公开(公告)号：US11727246B2

公开(公告)日：2023-08-15

申请号：US16283021

申请日：2019-02-22

Applicant: Intel Corporation

Inventor： Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu

IPC: G06N3/04 , G06N3/082 , G06N3/063 , G06T1/20 , G06N3/044 , G06N3/045

CPC classification number: G06N3/04 , G06N3/063 , G06N3/082 , G06T1/20 , G06N3/044 , G06N3/045

Abstract: Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.

35.

发明申请
COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS 有权

公开(公告)号：US20230061331A1

公开(公告)日：2023-03-02

申请号：US17960611

申请日：2022-10-05

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland

IPC: G06T1/20 , G06F7/483 , G06N3/08 , G06F9/30 , G06N3/04 , G06N3/063 , G06F9/50 , G06F9/38 , G06N20/00

Abstract: One embodiment provides a multi-chip module accelerator usable to execute tensor data processing operations a multi-chip module. The multi-chip module may include a memory stack including multiple memory dies and parallel processor circuitry communicatively coupled to the memory stack. The parallel processor circuitry may include multiprocessor cores to execute matrix multiplication and accumulate operations. The matrix multiplication and accumulate operations may include floating-point operations that are configurable to include two-dimensional matrix multiply and accumulate operations involving inputs that have differing floating-point precisions. The floating-point operations may include a first operation at a first precision and a second operation at a second precision. The first operation may include a multiply having at least one 16-bit floating-point input and the second operation may include an accumulate having a 32-bit floating-point input.

36.

发明授权
Instructions and logic to perform floating point and integer operations for machine learning 有权

公开(公告)号：US11360767B2

公开(公告)日：2022-06-14

申请号：US17305355

申请日：2021-07-06

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G06N20/00 , G06F17/16

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

37.

发明申请
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION 有权

公开(公告)号：US20220114430A1

公开(公告)日：2022-04-14

申请号：US17558285

申请日：2021-12-21

Applicant: Intel Corporation

Inventor： Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang

IPC: G06N3/063 , G06N3/04 , G06F9/30 , G06F9/38 , G06T1/20 , G06N3/08 , G06F16/17

Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.

38.

发明授权
Compute optimization mechanism 有权

公开(公告)号：US11270405B2

公开(公告)日：2022-03-08

申请号：US16983078

申请日：2020-08-03

Applicant: Intel Corporation

Inventor： Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. Macpherson , John C. Weast , Feng Chen , Farshad Akhbari , Narayan Srinivasa , Nadathur Rajagopalan Satish , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman

IPC: G06T1/20 , G06F3/14 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G09G5/36 , G06T15/04

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core to perform a mixed precision multi-dimensional matrix multiply and accumulate operation on 8-bit and/or 32 bit signed or unsigned integer elements.

39.

发明申请
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING 有权

公开(公告)号：US20220019431A1

公开(公告)日：2022-01-20

申请号：US17305355

申请日：2021-07-06

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G06N3/04 , G06F9/38 , G06F7/544 , G06N3/08 , G06N3/063 , G06F7/483 , G09G5/393 , G06T15/00 , G06F17/16 , G06N20/00

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

40.

发明授权
Programmable coarse grained and sparse matrix compute hardware with advanced scheduling 有权

公开(公告)号：US11210760B2

公开(公告)日：2021-12-28

申请号：US16928353

申请日：2020-07-14

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao

IPC: G06T1/20 , G06N3/04 , G06N3/063 , G06F9/38 , G06F9/30 , G06N3/08

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification