Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Hongbin Zheng"

11.

发明授权
Loop-oriented neural network compilation 有权

公开(公告)号：US11144291B1

公开(公告)日：2021-10-12

申请号：US16698320

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Hongbin Zheng , Preston Pengra Briggs , Tobias Joseph Kastulus Edler von Koch , Taemin Kim , Randy Renfu Huang

IPC: G06F9/44 , G06F8/41 , G06N3/04

Abstract: Methods of accelerating the execution of neural networks are disclosed. A description of a neural network may be received. A plurality of operators may be identified based on the description of the neural network. A plurality of symbolic models associated with the plurality of operators may be generated. For each symbolic model, a nested loop associated with an operator may be identified, a loop order may be defined, and a set of data dependencies may be defined. A set of inter-operator dependencies may be extracted based on the description of the neural network. The plurality of symbolic models and the set of inter-operator dependencies may be analyzed to identify a combinable pair of nested loops. The combinable pair of nested loops may be combined to form a combined nested loop.

12.

发明授权
Efficient utilization of processing element array 有权

公开(公告)号：US12198041B2

公开(公告)日：2025-01-14

申请号：US18352768

申请日：2023-07-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04 , G06N3/045 , G06N3/08

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

13.

发明授权
Compilation time reduction for memory and compute bound neural networks 有权

公开(公告)号：US12079734B1

公开(公告)日：2024-09-03

申请号：US17878824

申请日：2022-08-01

Applicant: Amazon Technologies, Inc.

Inventor： Hongbin Zheng , Randy Renfu Huang , Richard John Heaton

IPC: G06N3/10 , G06N3/04 , G06N3/08

CPC classification number: G06N3/10 , G06N3/04 , G06N3/08

Abstract: Techniques for reducing a compilation time for compiling a neural network are disclosed. A description of a neural network is received by a compiler. A plurality of operators are identified based on the description of the neural network. A plurality of subgraphs are formed, each including one or more operators. For each subgraph, a performance factor is calculated based on a compute usage and a memory usage associated with the operators included in the subgraph. The performance factor is compared to a threshold. Based on the comparison, either the subgraph is classified as a compute bound subgraph and a set of memory optimizations are suppressed or the subgraph is classified as a memory bound subgraph and a set of compute optimizations are suppressed.

14.

发明授权
Reconfigurable neural network processing based on subgraph recognition 有权

公开(公告)号：US11782706B1

公开(公告)日：2023-10-10

申请号：US17361992

申请日：2021-06-29

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Hongbin Zheng , Drazen Borkovic , Haichen Li

IPC: G06F8/41 , G06N3/063 , G06F9/30 , G06F7/548 , G06N3/04

CPC classification number: G06F9/3001 , G06F7/548 , G06F8/433 , G06F8/443 , G06N3/04 , G06N3/063

Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

15.

发明授权
Compilation time reduction for memory and compute bound neural networks 有权

公开(公告)号：US11461662B1

公开(公告)日：2022-10-04

申请号：US16829887

申请日：2020-03-25

Applicant: Amazon Technologies, Inc.

Inventor： Hongbin Zheng , Randy Renfu Huang , Richard John Heaton

IPC: G06N3/10 , G06N3/08 , G06N3/04

Abstract: Techniques for reducing a compilation time for compiling a neural network are disclosed. A description of a neural network is received by a compiler. A plurality of operators are identified based on the description of the neural network. A plurality of subgraphs are formed, each including one or more operators. For each subgraph, a performance factor is calculated based on a compute usage and a memory usage associated with the operators included in the subgraph. The performance factor is compared to a threshold. Based on the comparison, either the subgraph is classified as a compute bound subgraph and a set of memory optimizations are suppressed or the subgraph is classified as a memory bound subgraph and a set of compute optimizations are suppressed.

16.

发明申请
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY 有权

公开(公告)号：US20210158132A1

公开(公告)日：2021-05-27

申请号：US16698461

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification