Patent search ap:("Intel Corporation") AND inv:"Rajkishore Barik" Page 6

51.

发明授权
Work stealing in heterogeneous computing systems 有权

公开(公告)号：US11138048B2

公开(公告)日：2021-10-05

申请号：US15391549

申请日：2016-12-27

Applicant: Intel Corporation

Inventor： Rajkishore Barik , Stephan A. Herhut , Jaswanth Sreeram , Tatiana Shpeisman , Richard L. Hudson

IPC: G06F9/50 , G06F13/42

Abstract: A work stealer apparatus includes a determination module. The determination module is to determine to steal work from a first hardware computation unit of a first type for a second hardware computation unit of a second type that is different than the first type. The work is to be queued in a first work queue, which is to correspond to the first hardware computation unit, and which is to be stored in a shared memory that is to be shared by the first and second hardware computation units. A synchronized work stealer module is to steal the work through a synchronized memory access to the first work queue. The synchronized memory access is to be synchronized relative to memory accesses to the first work queue from the first hardware computation unit.

52.

发明申请
EXTEND GPU/CPU COHERENCY TO MULTI-GPU CORES 有权

公开(公告)号：US20210294649A1

公开(公告)日：2021-09-23

申请号：US17206514

申请日：2021-03-19

Applicant: Intel Corporation

Inventor： Chandrasekaran Sakthivel , Prasoonkumar Surti , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Abhishek R. Appu , Nicolas C. Galoppo Von Borries , Joydeep Ray , Narayan Srinivasa , Feng Chen , Ben J. Ashbaugh , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Eriko Nurvitadhi , Balaji Vembu , Altug Koker

IPC: G06F9/48 , G06N3/04 , G06F12/12

Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.

53.

发明授权
Instructions and logic to perform floating point and integer operations for machine learning 有权

公开(公告)号：US11080046B2

公开(公告)日：2021-08-03

申请号：US17169232

申请日：2021-02-05

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/302 , G06F7/483 , G06N3/04 , G06F17/16 , G06F9/30 , G09G5/393 , G06F7/544 , G06F9/38 , G06N3/08 , G06N3/063 , G06N20/00 , G06T15/00

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

54.

发明授权
Machine learning sparse computation mechanism 有权

公开(公告)号：US10943325B2

公开(公告)日：2021-03-09

申请号：US16930841

申请日：2020-07-16

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries

IPC: G06F17/16 , G06T1/20 , G06F9/30 , G06T1/60 , G06K9/62 , G06F12/0888 , G06F12/0815 , H03M7/30 , G06F9/48 , G06T15/00 , G06N3/04 , G06F9/38 , G06F12/0831 , G06F12/0811 , G06N3/08 , G06N20/00

Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.

55.

发明授权
Function callback mechanism between a Central Processing Unit (CPU) and an auxiliary processor 有权

公开(公告)号：US10706496B2

公开(公告)日：2020-07-07

申请号：US16282553

申请日：2019-02-22

Applicant: INTEL CORPORATION

Inventor： Brian T. Lewis , Rajkishore Barik , Tatiana Shpeisman

IPC: G06F9/54 , G06T1/20

Abstract: Generally, this disclosure provides systems, devices, methods and computer readable media for implementing function callback requests between a first processor (e.g., a GPU) and a second processor (e.g., a CPU). The system may include a shared virtual memory (SVM) coupled to the first and second processors, the SVM configured to store at least one double-ended queue (Deque). An execution unit (EU) of the first processor may be associated with a first of the Deques and configured to push the callback requests to that first Deque. A request handler thread executing on the second processor may be configured to: pop one of the callback requests from the first Deque; execute a function specified by the popped callback request; and generate a completion signal to the EU in response to completion of the function.

56.

发明授权
Compute optimization mechanism for deep neural networks 有权

公开(公告)号：US10417731B2

公开(公告)日：2019-09-17

申请号：US15494886

申请日：2017-04-24

Applicant: Intel Corporation

Inventor： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC: G06T1/20 , G06F9/455 , G06F9/50 , G06N3/04 , G06N3/063 , G06N3/08 , G06F8/41

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type.

57.

发明申请
EXTEND GPU/CPU COHERENCY TO MULTI-GPU CORES 审中-公开

公开(公告)号：US20190243764A1

公开(公告)日：2019-08-08

申请号：US16277267

申请日：2019-02-15

Applicant: Intel Corporation

Inventor： Chandrasekaran Sakthivel , Prasoonkumar Surti , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Abhishek R. Appu , Nicolas C. Galoppo Von Borries , Joydeep Ray , Narayan Srinivasa , Feng Chen , Ben J. Ashbaugh , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Eriko Nurvitadhi , Balaji Vembu , Altug Koker

IPC: G06F12/0837 , G06N20/00 , G06T1/20 , G06N3/08

CPC classification number: G06F12/0837 , G06F12/0815 , G06F2212/62 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/08 , G06N3/084 , G06N3/088 , G06N20/00 , G06T1/20

Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.

58.

发明授权
Instructions and logic to perform floating-point and integer operations for machine learning 有权

公开(公告)号：US10353706B2

公开(公告)日：2019-07-16

申请号：US15819152

申请日：2017-11-21

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G09G5/393 , G06N3/08 , G06N3/063 , G06T15/00 , G06N20/00

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

59.

发明申请
ADAPTIVE SCHEDULING FOR TASK ASSIGNMENT AMONG HETEROGENEOUS PROCESSOR CORES 审中-公开

公开(公告)号：US20190080429A1

公开(公告)日：2019-03-14

申请号：US16185965

申请日：2018-11-09

Applicant: Intel Corporation

Inventor： Rajkishore Barik , Tatiana Shpeisman , Brian T. Lewis , Rashid Kaleem

IPC: G06T1/20 , G09G5/36 , G09G5/00 , G06F3/14

CPC classification number: G06T1/20 , G06F3/14 , G09G5/001 , G09G5/363 , G09G2360/08

Abstract: Generally, this disclosure provides systems, devices, methods and computer readable media for adaptive scheduling of task assignment among heterogeneous processor cores. The system may include any number of CPUs, a graphics processing unit (GPU) and memory configured to store a pool of work items to be shared by the CPUs and GPU. The system may also include a GPU proxy profiling module associated with one of the CPUs to profile execution of a first portion of the work items on the GPU. The system may further include profiling modules, each associated with one of the CPUs, to profile execution of a second portion of the work items on each of the CPUs. The measured profiling information from the CPU profiling modules and the GPU proxy profiling module is used to calculate a distribution ratio for execution of a remaining portion of the work items between the CPUs and the GPU.

60.

发明授权
Programmable coarse grained and sparse matrix compute hardware with advanced scheduling 有权

公开(公告)号：US10186011B2

公开(公告)日：2019-01-22

申请号：US15581182

申请日：2017-04-28

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao

IPC: G06T1/20 , G06N3/08 , G06N3/04

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification