专利检索 ap:("Intel Corporation") AND inv:"Deepak Abraham Mathaikutty" 第 1 页

1.

发明公开
OUTPUT DRAIN PATH FACILITATING FLEXIBLE SCHEDULE-BASED DEEP NEURAL NETWORK ACCELERATOR 审中-公开

公开(公告)号：US20240013040A1

公开(公告)日：2024-01-11

申请号：US18474464

申请日：2023-09-26

申请人： Intel Corporation

发明人： Arnab Raha , Deepak Abraham Mathaikutty , Umer Iftikhar Cheema , Dinakar Kondru

IPC分类号： G06N3/063 , G06N3/048 , G06N3/0464

CPC分类号： G06N3/063 , G06N3/048 , G06N3/0464

摘要： A drain module may drain activations in an output tensor of a convolution from a PE array that performs the convolution. The drain module may extract activations generated in a collection of PE columns. The activations generated in the PE columns in the collection may be concatenated, e.g., activations generated in the first PE column of the collection may be followed by activations generated in the second PE column of the collection, and so on. The activations in the output tensor may be rearranged into activation vectors. Each activation vector may include activations in different output channels of the deep learning operation. The activations in each activation vector may have the same (X, Y) coordinate in the output tensor. The drain module may determine a memory address for an activation based on the activation's (X, Y, Z) coordinate in the output tensor and write the activation to the memory address.

2.

发明公开
SCHEDULING COMPUTATIONS IN DEEP NEURAL NETWORK BASED ON SPARSITY 审中-公开

公开(公告)号：US20230229507A1

公开(公告)日：2023-07-20

申请号：US18180415

申请日：2023-03-08

申请人： Intel Corporation

发明人： Raymond Jit-Hung Sung , Arnab Raha , Deepak Abraham Mathaikutty , Umer Iftikhar Cheema

IPC分类号： G06F9/50 , G06N3/04

CPC分类号： G06F9/5027 , G06N3/04 , H04L41/16

摘要： Computations in processing elements (PEs) for executing a deep neural network are scheduled via a computation scheduler based on sparsity in input data of the computations to reduce voltage droops. Each PE may compute an input operand and a weight operand in a computation. The computation scheduler may predict the workload of the PE for the computation based on a combined sparsity bitmap, which may be generated based on a sparsity bitmap of the input operand and a sparsity bitmap of the weight operand. The computation scheduler can schedule the starts of the computations in the PEs based on the predicted workloads of the PEs. The computation scheduler may instruct the PE having the highest workload to start the computation first and instruct the other PEs to start computations later. In some embodiments, the computations in the PEs may end in the same clock cycle.

3.

发明申请
DATA REUSE IN DEEP LEARNING 有权

公开(公告)号：US20220188638A1

公开(公告)日：2022-06-16

申请号：US17684764

申请日：2022-03-02

申请人： Intel Corporation

发明人： Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , Debabrata Mohapatra

IPC分类号： G06N3/08 , G06N3/063 , G06F7/544

摘要： An apparatus for convolution operations is provided. The apparatus includes a PE array, a datastore, writing modules, reading modules, and a controlling module. The PE array performs MAC operations. The datastore includes databanks, each of which stores data to be used by a column of the PE array. The writing modules transfer data from a memory to the datastore. The reading modules transfer data from the datastore to the PE array. Each reading module may transfer data to a particular column of the PE array. The controlling module can determine the rounds of a convolution operation. Each round includes MAC operations based on a weight. The controlling module controls the writing modules and reading modules so that the same data in a databank can be reused in multiple rounds. For different rounds, the controlling module can provide a reading module accesses to different databanks.

4.

发明公开
ACCURACY-BASED APPROXIMATION OF ACTIVATION FUNCTIONS WITH PROGRAMMABLE LOOK-UP TABLE HAVING AREA BUDGET 审中-公开

公开(公告)号：US20240111830A1

公开(公告)日：2024-04-04

申请号：US18534035

申请日：2023-12-08

申请人： Intel Corporation

发明人： Umer Iftikhar Cheema , Robert Simofi , Deepak Abraham Mathaikutty , Arnab Raha , Dinakar Kondru

IPC分类号： G06F17/17 , G06F1/03

CPC分类号： G06F17/17 , G06F1/0307

摘要： A non-linear activation function in a neural network may be approximated by one or more linear functions. The input range may be divided into input segments, each of which corresponds to a different exponent in the input range of the activation function and includes input data elements having the exponent. Target accuracies may be assigned to the identified exponents based on a statistics analysis of the input data elements. The target accuracy of an input segment will be used to determine one or more linear functions that approximate the activation function for the input segment. An error of an approximation of the activation function by a linear function for the input segment may be within the target accuracy. The parameters of the linear functions may be stored in a look-up table (LUT). During the execution of the DNN, the LUT may be used to execute the activation function.

5.

发明申请
FLOATING POINT MULTIPLY-ACCUMULATE UNIT FOR DEEP LEARNING 有权

公开(公告)号：US20220188075A1

公开(公告)日：2022-06-16

申请号：US17688131

申请日：2022-03-07

申请人： Intel Corporation

发明人： Arnab Raha , Mark A. Anders , Raymond Jit-Hung Sung , Debabrata Mohapatra , Deepak Abraham Mathaikutty , Ram K. Krishnamurthy , Himanshu Kaul

IPC分类号： G06F7/544 , G06F7/483 , G06N3/08

摘要： A FPMAC operation has two operands: an input operand and a weight operand. The operands may have a format of FP16, BF16, or INT8. Each operand is split into two portions. The two portions are stored in separate storage units. Then operands are transferred to register files of a PE, with each register file storing bits of an operand sequentially. The PE performs the FPMAC operation based on the operands. The PE may include an FPMAC unit configured to compute an individual partial sum of the PE. The PE may also include an FP adder to accumulate the individual partial sum with other data, such as an output from another PE or an output form another PE array. The FP adder may be fused with the FPMAC unit in a single circuit that can do speculative alignment and has separate critical paths for alignment and normalization.

6.

发明公开
APPROXIMATING ACTIVATION FUNCTION IN NEURAL NETWORK WITH LOOK-UP TABLE HAVING HYBRID ARCHITECTURE 审中-公开

公开(公告)号：US20240160695A1

公开(公告)日：2024-05-16

申请号：US18392618

申请日：2023-12-21

申请人： Intel Corporation

发明人： Dinakar Kondru , Deepak Abraham Mathaikutty , Arnab Raha , Umer Iftikhar Cheema

IPC分类号： G06F17/17 , G06F1/035

CPC分类号： G06F17/17 , G06F1/0356

摘要： A non-linear activation function may be approximated by linear functions. The input range of the activation function may be divided into input segments. One or more input segments may be selected based on statistical analysis of input data elements in the input range. A parameter of a first linear function that approximates the activation function for at least part of a selected input segment may be stored in a first portion of a first look-up table (LUT). The first portion of the first LUT is dedicated to a first group of post processing engines (PPEs). A parameter of a second linear function that approximates the activation function for at least part of an unselected input segment may be stored in a shared pool of LUT entries, which includes a second portion of the first LUT and a portion of a second LUT and is shared by multiple groups of PPEs.

7.

发明公开
PRUNING ACTIVATIONS AND WEIGHTS OF NEURAL NETWORKS WITH PROGRAMMABLE THRESHOLDS 审中-公开

公开(公告)号：US20230394312A1

公开(公告)日：2023-12-07

申请号：US18453715

申请日：2023-08-22

申请人： Intel Corporation

发明人： Soumendu Kumar Ghosh , Shamik Kundu , Arnab Raha , Deepak Abraham Mathaikutty

IPC分类号： G06N3/082 , G06N3/0464

CPC分类号： G06N3/082 , G06N3/0464

摘要： Activations (e.g., output activations) or weights of intermediate layers of deep neural networks (DNNs) can be pruned to increase sparsity and reduce the amount of computation required for performing the computations in the layers or subsequent layers. A pruning threshold may be determined, e.g., through an iterative process, and activations or weights having absolute values lower than the pruning threshold may be changed to zero. A first pruning threshold may be used to prune an output tensor or kernel of a layer. The loss in the accuracy of the DNN due to the pruning may be determined. A second pruning threshold may be determined based on the first pruning threshold and the accuracy loss. The DNN may be modified by adding a pruning operation to the layer. The pruning operation can prune output tensors or kernels of the layer based on the second pruning threshold.

8.

发明公开
HYBRID MULTIPY-ACCUMULATION OPERATION WITH COMPRESSED WEIGHTS 审中-公开

公开(公告)号：US20230229917A1

公开(公告)日：2023-07-20

申请号：US18184101

申请日：2023-03-15

申请人： Intel Corporation

发明人： Michael Wu , Arnab Raha , Deepak Abraham Mathaikutty , Nihat Tunali , Martin Langhammer

IPC分类号： G06N3/08 , G06F7/544

CPC分类号： G06N3/08 , G06F7/5443

摘要： A compute block can perform hybrid multiply-accumulate (MAC) operations. The compute block may include a weight compressing module and a processing element (PE) array. The weight compression module may select a first group of one or more weights and a second group of one or more weights from a weight tensor of a DNN (deep neural network) layer. A weight in the first group is quantized to a power of two value. A weight in the second group is quantized to an integer. The integer and the exponent of the power of two value may be stored in a memory in lieu of the original values of the weights. A PE in the PE array includes a shifter configured to shift an activation of the layer by the exponent of the power of two value and a multiplier configured to multiplying the integer with another activation of the layer.

9.

发明申请
ACCELERATING DATA LOAD AND COMPUTATION IN FRONTEND CONVOLUTIONAL LAYER 有权

公开(公告)号：US20230073661A1

公开(公告)日：2023-03-09

申请号：US18055315

申请日：2022-11-14

申请人： Intel Corporation

发明人： Deepak Abraham Mathaikutty , Arnab Raha , Umer Iftikhar Cheema , Raymond Jit-Hung Sung

IPC分类号： G06N3/04 , G06F17/16 , G06N3/08

摘要： An DNN (deep neural network) accelerator may accelerate deep learning, such as convolutions in frontend layers through a scheduler for loading data to be processed. The DNN accelerator may store, in a memory, an input tensor of a convolutional layer in a DNN. The convolutional layer may be the first layer or a layer that is arranged before the one or more other convolutional layers in the DNN such that data processed by the first layer can be efficiently reused across data load rounds. The input tensor includes one or more channels. A channel includes activations arranged in rows and columns. The DNN accelerator may read at least a portion of the input tensor from the memory into a datastore. The datastore includes some databanks. The DNN accelerator may provide a vector of one or more activations to a processing element for operations such as multiplications on the vector.

10.

发明申请
RUNTIME CONFIGURABLE REGISTER FILES FOR ARTIFICIAL INTELLIGENCE WORKLOADS 有权

公开(公告)号：US20220075659A1

公开(公告)日：2022-03-10

申请号：US17530156

申请日：2021-11-18

申请人： Intel Corporation

发明人： Debabrata Mohapatra , Arnab Raha , Deepak Abraham Mathaikutty , Raymond Jit-Hung Sung , Cormac Michael Brick

IPC分类号： G06F9/50 , G06F9/30 , G06F7/544 , G06N3/04

摘要： There is disclosed a system and method of performing an artificial intelligence (AI) inference, including: programming an AI accelerator circuit to solve an AI problem with a plurality of layer-specific register file (RF) size allocations, wherein the AI accelerator circuit comprises processing elements (PEs) with respective associated RFs, wherein the RFs individually are divided into K sub-banks of size B bytes, wherein B and K are integers, and wherein the RFs include circuitry to individually allocate a sub-bank to one of input feature (IF), output feature (OF), or filter weight (FL), and wherein programming the plurality of layer-specific RF size allocations comprises accounting for sparse data within the layer; and causing the AI accelerator circuit to execute the AI problem, including applying the layer-specific RF size allocations at run-time.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类