Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Hongbin Zheng"

1.

发明授权
Dropout layer in a neural network processor 有权

公开(公告)号：US12159218B1

公开(公告)日：2024-12-03

申请号：US16940199

申请日：2020-07-27

Applicant: Amazon Technologies, Inc.

Inventor： Jiading Gai , Hongbin Zheng , Animesh Jain , Randy Renfu Huang , Vignesh Vivekraja

IPC: G06N3/063 , G06N3/04

Abstract: A single instruction multiple data (SIMD) processor is used to implement a dropout layer between a first layer and a second layer of a neural network. The SIMD processor can implement the dropout layer by setting one or more elements in an output tensor of the first layer to zero before providing it as an input tensor to the second layer. Setting of the one or more elements to zero is based on a dropout rate, and pseudo-random numbers generated by a random number generator in the SIMD processor.

2.

发明授权
Efficient utilization of processing element array 有权

公开(公告)号：US11741350B2

公开(公告)日：2023-08-29

申请号：US16698461

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04 , G06N3/08

CPC classification number: G06N3/063 , G06N3/04

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

3.

发明申请
HIERARCHICAL PARTITIONING OF OPERATORS 有权

公开(公告)号：US20210158131A1

公开(公告)日：2021-05-27

申请号：US16698236

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Animesh Jain , Yizhi Liu , Hongbin Zheng , Jeffrey T. Huynh , Haichen Li , Drazen Borkovic , Jindrich Zejda , Richard John Heaton , Randy Renfu Huang , Zhi Chen , Yida Wang

IPC: G06N3/063 , G06N3/04

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

4.

发明授权
Converting quasi-affine expressions to matrix operations 有权

公开(公告)号：US12175222B1

公开(公告)日：2024-12-24

申请号：US16949958

申请日：2020-11-20

Applicant: Amazon Technologies, Inc.

Inventor： Michael Ray Benfield , Hongbin Zheng , Thomas Robert Norell

IPC: G06N3/084 , G06F8/41 , G06N3/04 , G06F17/16

Abstract: A computer-implemented method includes generating, based on a representation of a tensor mapping between an input tensor and an output tensor, a list of mappings from elements of the input tensor to elements of the output tensor, and generating groups of mappings from the list of mappings, where each of the groups of mappings corresponds to a respective set of matrix multiplications, a matrix transpose, or both. The computer-implemented method also includes generating a respective expression for each of the groups of mappings and generating code for summing results of the respective expressions, where each respective expression includes the respective set of matrix multiplications, the matrix transpose, or both.

5.

发明授权
Global modulo allocation in neural network compilation 有权

公开(公告)号：US11809849B1

公开(公告)日：2023-11-07

申请号：US17326175

申请日：2021-05-20

Applicant: Amazon Technologies, Inc.

Inventor： Hongbin Zheng , Randy Renfu Huang , Robert Geva

IPC: G06F9/44 , G06F8/41 , G06F13/28 , G06N3/04 , G06F9/38

CPC classification number: G06F8/452 , G06F9/3853 , G06F13/28 , G06N3/04

Abstract: In one example, a method performed by a compiler comprises: receiving a dataflow graph of a neural network, the neural network comprising a neural network operator; receiving information of computation resources and memory resources of a neural network hardware accelerator intended to execute the neural network operator; determining, based on the dataflow graph, iterations of an operation on elements of a tensor included in the neural network operator; determining, based on the information, a mapping between the elements of the tensor to addresses in the portion of the local memory, and a number of the iterations of the operation to be included in a batch, wherein the number of the iterations in the batch are to be executed in parallel by the neural network hardware accelerator; and generating a schedule of execution of the batches of the iterations of the operations.

6.

发明授权
Hierarchical partitioning of operators 有权

公开(公告)号：US12182688B2

公开(公告)日：2024-12-31

申请号：US16698236

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Animesh Jain , Yizhi Liu , Hongbin Zheng , Jeffrey T. Huynh , Haichen Li , Drazen Borkovic , Jindrich Zejda , Richard John Heaton , Randy Renfu Huang , Zhi Chen , Yida Wang

IPC: G06N3/063 , G06N3/04

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

7.

发明授权
Reconfigurable neural network processing based on subgraph recognition 有权

公开(公告)号：US12045611B1

公开(公告)日：2024-07-23

申请号：US18231024

申请日：2023-08-07

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Hongbin Zheng , Drazen Borkovic , Haichen Li

IPC: G06F9/30 , G06F7/548 , G06F8/41 , G06N3/04 , G06N3/063

CPC classification number: G06F9/3001 , G06F7/548 , G06F8/433 , G06F8/443 , G06N3/04 , G06N3/063

Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

8.

发明授权
Compilation with caching of code analysis result 有权

公开(公告)号：US11941383B1

公开(公告)日：2024-03-26

申请号：US17654059

申请日：2022-03-08

Applicant: Amazon Technologies, Inc.

Inventor： Hongbin Zheng , Pushkar Ratnalikar

IPC: G06F8/41

CPC classification number: G06F8/443 , G06F8/427

Abstract: Techniques to speed up code compilation may include caching code analysis results such that the analysis of subsequent code having a similar structured can be omitted. For example, a loop-nest construct in the code can be parsed, and an execution statement in the loop-nest construct can be analyzed by a compiler to generate an analysis result indicating a set of execution conditions for the execution statement. A lookup key can be generated from the control statements bounding the execution statement, and the analysis result can be stored with the lookup key in a cache entry of the cache. The execution statement is then modified according to the analysis result for optimization. Instead of having to analyze a subsequent execution statement bounded by the same control statements, the analysis result of the subsequent execution statement can be retrieved from the cache and be used to modify the subsequent execution statement.

9.

发明公开
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY 审中-公开

公开(公告)号：US20230359876A1

公开(公告)日：2023-11-09

申请号：US18352768

申请日：2023-07-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

CPC classification number: G06N3/063 , G06N3/04

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

10.

发明授权
State buffer memloc reshaping 有权

公开(公告)号：US11494321B1

公开(公告)日：2022-11-08

申请号：US17449586

申请日：2021-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Yunxuan Yu , Hongbin Zheng , Qingrui Liu

IPC: G06F13/28 , G06F13/16 , G06F9/50 , G06F9/30

Abstract: A computer-implemented method includes identifying, from instruction code for executing by a computing system to implement a neural network, a first instruction for allocating a first region of a local memory of an accelerator of the computing system to a tensor, and a first direct memory access (DMA) load instruction for loading the tensor from a location of a system memory of the computing system to a second region of the local memory; adding a first tensor copy instruction in the instruction code to save the tensor in the first region of the local memory to a third region of the local memory that has dimensions different from dimensions of the first region; and replacing the first DMA load instruction with a second tensor copy instruction for saving data in the third region of the local memory to the second region of the local memory.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification