Patent search ap:("Intel Corporation") AND inv:"Abhijit Davare" Page 1

1.

发明授权
Apparatus and method for adaptable and efficient lane-wise tensor processing 有权

公开(公告)号：US10776110B2

公开(公告)日：2020-09-15

申请号：US16147696

申请日：2018-09-29

Applicant: Intel Corporation

Inventor： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi

IPC: G06F9/30 , G06F9/38 , G06F17/16 , G06F7/57 , G06F12/0831 , G06F12/084

Abstract: An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule a plurality of matrix operations responsive to a tensor matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, each lane comprising: first, second, and third tile registers to store blocks of a first matrix (A), second matrix (B), and third matrix (C), respectively; at least one tensor arithmetic logic unit (TALU) to multiply a block of elements of the first matrix with a block of elements of the second matrix to generate a product and to accumulate the product with a block of elements of the third matrix, wherein each lane is to multiply one or more different blocks of the first and second matrix and to accumulate the resulting one or more products with one or more different blocks of the third matrix; and broadcast circuitry to broadcast one or more invariant matrix blocks to different tile registers within a lane and/or different tile registers across different lanes.

2.

发明申请
ARCHITECTURE AND METHOD FOR DATA PARALLEL SINGLE PROGRAM MULTIPLE DATA (SPMD) EXECUTION 审中-公开

公开(公告)号：US20200104139A1

公开(公告)日：2020-04-02

申请号：US16147692

申请日：2018-09-29

Applicant: Intel Corporation

Inventor： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Andrey Ayupov

IPC: G06F9/38 , G06F9/30

Abstract: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.

3.

发明授权
Apparatus and method for adaptable and efficient lane-wise tensor processing 有权

公开(公告)号：US11379229B2

公开(公告)日：2022-07-05

申请号：US16987838

申请日：2020-08-07

Applicant: INTEL CORPORATION

Inventor： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Debbie Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond A. Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi

IPC: G06F9/30 , G06F9/38 , G06F17/16 , G06F7/57 , G06F12/0831 , G06F12/084

Abstract: An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule matrix operations responsive to a matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, wherein a lane comprises an arithmetic logic unit to multiply a block of a first matrix with a block of a second matrix to generate a product and to accumulate the product with a block of a third matrix, and wherein the matrix blocks are to be stored in registers within the lane; and broadcast circuitry to broadcast one or more invariant matrix blocks to at least one of different registers within the lane and different registers across different lanes.

4.

发明授权
Hardware offload circuitry 有权

公开(公告)号：US12197601B2

公开(公告)日：2025-01-14

申请号：US17560193

申请日：2021-12-22

Applicant: Intel Corporation

Inventor： Ren Wang , Sameh Gobriel , Somnath Paul , Yipeng Wang , Priya Autee , Abhirupa Layek , Shaman Narayana , Edwin Verplanke , Mrittika Ganguli , Jr-Shian Tsai , Anton Sorokin , Suvadeep Banerjee , Abhijit Davare , Desmond Kirkpatrick , Rajesh M. Sankaran , Jaykant B. Timbadiya , Sriram Kabisthalam Muthukumar , Narayan Ranganathan , Nalini Murari , Brinda Ganesh , Nilesh Jain

IPC: G06F15/78 , G06F9/50 , G06F21/62 , G06F21/72

Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.

5.

发明申请
METHODS AND APPARATUS FOR DATA ENHANCED AUTOMATED MODEL GENERATION 有权

公开(公告)号：US20220114451A1

公开(公告)日：2022-04-14

申请号：US17559730

申请日：2021-12-22

Applicant: Intel Corporation

Inventor： Chaunté W. Lacewell , Juan Pablo Muñoz , Rajesh Poornachandran , Nilesh Jain , Anahita Bhiwandiwalla , Eriko Nurvitadhi , Abhijit Davare

IPC: G06N3/08 , G06N3/063 , G06N3/04

Abstract: Methods, apparatus, systems, and articles of manufacture for data enhanced automated model generation are disclosed. Example instructions, when executed, cause at least one processor to access a request to generate a machine learning model to perform a selected task, generate task knowledge based on a previously generated machine learning model, create a search space based on the task knowledge, and generate a machine learning model using neural architecture search, the neural architecture search beginning based on the search space.

6.

发明申请
APPARATUS AND METHOD FOR ADAPTABLE AND EFFICIENT LANE-WISE TENSOR PROCESSING 审中-公开

公开(公告)号：US20200104126A1

公开(公告)日：2020-04-02

申请号：US16147696

申请日：2018-09-29

Applicant: Intel Corporation

Inventor： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi

IPC: G06F9/30 , G06F9/38 , G06F17/16 , G06F12/0831 , G06F12/084 , G06F7/57

Abstract: An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule a plurality of matrix operations responsive to a tensor matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, each lane comprising: first, second, and third tile registers to store blocks of a first matrix (A), second matrix (B), and third matrix (C), respectively; at least one tensor arithmetic logic unit (TALU) to multiply a block of elements of the first matrix with a block of elements of the second matrix to generate a product and to accumulate the product with a block of elements of the third matrix, wherein each lane is to multiply one or more different blocks of the first and second matrix and to accumulate the resulting one or more products with one or more different blocks of the third matrix; and broadcast circuitry to broadcast one or more invariant matrix blocks to different tile registers within a lane and/or different tile registers across different lanes.

7.

发明申请
APPARATUS, ARTICLES OF MANUFACTURE, AND METHODS FOR COMPOSABLE MACHINE LEARNING COMPUTE NODES 有权

公开(公告)号：US20220114495A1

公开(公告)日：2022-04-14

申请号：US17558284

申请日：2021-12-21

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Rajesh Poornachandran , Abhijit Davare , Nilesh Jain , Chaunte Lacewell , Anahita Bhiwandiwalla , Juan Pablo Munoz , Andrew Boutros , Yash Akhauri

IPC: G06N20/00 , G06N5/02 , G06F17/16 , G06F9/50

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for composable machine learning compute nodes. An example apparatus includes interface circuitry to receive a workload, instructions in the apparatus, and processor circuitry to at least one of execute or instantiate the instructions to generate a first configuration of one or more machine-learning models based on a workload, generate a second configuration of hardware, determine an evaluation parameter based on an execution of the workload, the execution of the workload based on the first configuration and the second configuration, and, in response to the evaluation parameter satisfying a threshold, execute the one or more machine-learning models in the first configuration on the hardware in the second configuration, the one or more machine-learning models and the hardware to execute the workload.

8.

发明授权
Gather-scatter cache architecture having plurality of tag and data banks and arbiter for single program multiple data (SPMD) processor 有权

公开(公告)号：US10896141B2

公开(公告)日：2021-01-19

申请号：US16364725

申请日：2019-03-26

Applicant: Intel Corporation

Inventor： Jeffrey J. Cook , Jonathan D. Pearce , Srikanth T. Srinivasan , Rishiraj A. Bheda , David B. Sheffield , Abhijit Davare , Anton Alexandrovich Sorokin

IPC: G06F13/16 , G06F9/38 , H04L9/06 , G06F12/0815

Abstract: In one embodiment, a cache memory includes: a plurality of data banks, each of the plurality of data banks having a plurality of entries each to store a portion of a cache line distributed across the plurality of data banks; and a plurality of tag banks decoupled from the plurality of data banks, wherein a tag for a cache line is to be assigned to one of the plurality of tag banks. Other embodiments are described and claimed.

9.

发明授权
Architecture and method for data parallel single program multiple data (SPMD) execution 有权

公开(公告)号：US10831505B2

公开(公告)日：2020-11-10

申请号：US16147692

申请日：2018-09-29

Applicant: Intel Corporation

Inventor： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Andrey Ayupov

IPC: G06F9/38 , G06F9/30

Abstract: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.

10.

发明申请
Gather-Scatter Cache Architecture For Single Program Multiple Data (SPMD) Processor 审中-公开

公开(公告)号：US20200310992A1

公开(公告)日：2020-10-01

申请号：US16364725

申请日：2019-03-26

Applicant: Intel Corporation

Inventor： Jeffrey J. Cook , Jonathan D. Pearce , Srikanth T. Srinivasan , Rishiraj A. Bheda , David B. Sheffield , Abhijit Davare , Anton Alexandrovich Sorokin

IPC: G06F13/16 , G06F9/38 , G06F12/0815 , H04L9/06

Abstract: In one embodiment, a cache memory includes: a plurality of data banks, each of the plurality of data banks having a plurality of entries each to store a portion of a cache line distributed across the plurality of data banks; and a plurality of tag banks decoupled from the plurality of data banks, wherein a tag for a cache line is to be assigned to one of the plurality of tag banks. Other embodiments are described and claimed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification