-
公开(公告)号:US20230359876A1
公开(公告)日:2023-11-09
申请号:US18352768
申请日:2023-07-14
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.
-
公开(公告)号:US12182688B2
公开(公告)日:2024-12-31
申请号:US16698236
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Animesh Jain , Yizhi Liu , Hongbin Zheng , Jeffrey T. Huynh , Haichen Li , Drazen Borkovic , Jindrich Zejda , Richard John Heaton , Randy Renfu Huang , Zhi Chen , Yida Wang
Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.
-
公开(公告)号:US11809981B1
公开(公告)日:2023-11-07
申请号:US16698753
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Animesh Jain , Tobias Joseph Kastulus Edler von Koch , Yizhi Liu , Taemin Kim , Jindrich Zejda , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang
CPC classification number: G06N3/063 , G06F9/30007 , G06F9/545
Abstract: A method of generating executable instructions for a computing system is provided. The method comprises: receiving a first set of instructions including a kernel of a first operator and a kernel of a second operator, the kernel of the first operator including instructions of the first operator and write instructions to a virtual data node, the kernel of the second operator including instructions of the second operator and read instructions to the virtual data node; determining, based on a mapping between the write instructions and read instructions, instructions of data transfer operations between the first operator and the second operator; and generating a second set of instructions representing a fused operator of the first operator and the second operator, the second set of instructions including the instructions of the first operator, the instructions of the second operator, and the instructions of the data transfer operations.
-
公开(公告)号:US12159218B1
公开(公告)日:2024-12-03
申请号:US16940199
申请日:2020-07-27
Applicant: Amazon Technologies, Inc.
Inventor: Jiading Gai , Hongbin Zheng , Animesh Jain , Randy Renfu Huang , Vignesh Vivekraja
Abstract: A single instruction multiple data (SIMD) processor is used to implement a dropout layer between a first layer and a second layer of a neural network. The SIMD processor can implement the dropout layer by setting one or more elements in an output tensor of the first layer to zero before providing it as an input tensor to the second layer. Setting of the one or more elements to zero is based on a dropout rate, and pseudo-random numbers generated by a random number generator in the SIMD processor.
-
公开(公告)号:US11741350B2
公开(公告)日:2023-08-29
申请号:US16698461
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.
-
公开(公告)号:US20210158131A1
公开(公告)日:2021-05-27
申请号:US16698236
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Animesh Jain , Yizhi Liu , Hongbin Zheng , Jeffrey T. Huynh , Haichen Li , Drazen Borkovic , Jindrich Zejda , Richard John Heaton , Randy Renfu Huang , Zhi Chen , Yida Wang
Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.
-
公开(公告)号:US12198041B2
公开(公告)日:2025-01-14
申请号:US18352768
申请日:2023-07-14
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.
-
公开(公告)号:US20210158132A1
公开(公告)日:2021-05-27
申请号:US16698461
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.
-
-
-
-
-
-
-