Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Vinod Sharma"

1.

发明授权
Efficient utilization of processing element array 有权

公开(公告)号：US12198041B2

公开(公告)日：2025-01-14

申请号：US18352768

申请日：2023-07-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04 , G06N3/045 , G06N3/08

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

2.

发明申请
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY 有权

公开(公告)号：US20210158132A1

公开(公告)日：2021-05-27

申请号：US16698461

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

3.

发明公开
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY 审中-公开

公开(公告)号：US20230359876A1

公开(公告)日：2023-11-09

申请号：US18352768

申请日：2023-07-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

CPC classification number: G06N3/063 , G06N3/04

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

4.

发明授权
Efficient utilization of processing element array 有权

公开(公告)号：US11741350B2

公开(公告)日：2023-08-29

申请号：US16698461

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04 , G06N3/08

CPC classification number: G06N3/063 , G06N3/04

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

5.

发明授权
Analytical model to optimize deep learning models 有权

公开(公告)号：US12293299B1

公开(公告)日：2025-05-06

申请号：US17338047

申请日：2021-06-03

Applicant: Amazon Technologies, Inc.

Inventor： Vinod Sharma , Yao Wang , Xingyu Zhou , Yanming Wang , Yong Wu , Rui Li

IPC: G06N3/10 , G06F9/38 , G06N3/04

Abstract: Techniques for optimizing and deploying deep neural network (CNN) machine learning models for inference using static analysis are described. A method includes obtaining a deep neural network (DNN) machine learning (ML) model, generating an intermediate representation for the ML model, the intermediate representation including one or more nodes corresponding to one or more operators utilized by the ML model, identifying, for at least one node of the intermediate representation, an optimized schedule for at least one operator corresponding to the at least one node using a static analysis that is based on a hardware-specific cost model, generating an optimized intermediate representation using the optimized schedule that is optimized for execution on a hardware platform, and generating code corresponding to the ML model based at least in part on the optimized intermediate representation, wherein the code is specific to the hardware platform.

6.

发明授权
Performing hardware operator fusion 有权

公开(公告)号：US11809981B1

公开(公告)日：2023-11-07

申请号：US16698753

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Animesh Jain , Tobias Joseph Kastulus Edler von Koch , Yizhi Liu , Taemin Kim , Jindrich Zejda , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang

IPC: G06N3/063 , G06F9/30 , G06F9/54

CPC classification number: G06N3/063 , G06F9/30007 , G06F9/545

Abstract: A method of generating executable instructions for a computing system is provided. The method comprises: receiving a first set of instructions including a kernel of a first operator and a kernel of a second operator, the kernel of the first operator including instructions of the first operator and write instructions to a virtual data node, the kernel of the second operator including instructions of the second operator and read instructions to the virtual data node; determining, based on a mapping between the write instructions and read instructions, instructions of data transfer operations between the first operator and the second operator; and generating a second set of instructions representing a fused operator of the first operator and the second operator, the second set of instructions including the instructions of the first operator, the instructions of the second operator, and the instructions of the data transfer operations.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification