-
公开(公告)号:US12159218B1
公开(公告)日:2024-12-03
申请号:US16940199
申请日:2020-07-27
Applicant: Amazon Technologies, Inc.
Inventor: Jiading Gai , Hongbin Zheng , Animesh Jain , Randy Renfu Huang , Vignesh Vivekraja
Abstract: A single instruction multiple data (SIMD) processor is used to implement a dropout layer between a first layer and a second layer of a neural network. The SIMD processor can implement the dropout layer by setting one or more elements in an output tensor of the first layer to zero before providing it as an input tensor to the second layer. Setting of the one or more elements to zero is based on a dropout rate, and pseudo-random numbers generated by a random number generator in the SIMD processor.
-
公开(公告)号:US12093801B1
公开(公告)日:2024-09-17
申请号:US18142952
申请日:2023-05-03
Applicant: Amazon Technologies, Inc.
Inventor: Richard John Heaton , Randy Renfu Huang , Ron Diamant
IPC: G06F16/00 , G06F9/30 , G06F9/48 , G06F16/901 , G06N3/04
CPC classification number: G06N3/04 , G06F9/30003 , G06F9/4881 , G06F16/9024
Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.
-
公开(公告)号:US11741350B2
公开(公告)日:2023-08-29
申请号:US16698461
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.
-
公开(公告)号:US20230100930A1
公开(公告)日:2023-03-30
申请号:US17449576
申请日:2021-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Xiaodan Tan , Paul Gilbert Meyer , Gennady Pekhimenko , Randy Renfu Huang
Abstract: Techniques for compressing a neural network model by mixing compression ratios (sparsity patterns) are described. The weight tensor of a neural network model is divided into weight groups. The pruning cost of compressing the weight values according to a compression ratio is determined for each weight group, and a pruning cost distribution for the compression ratio is generated from the pruning costs of the weight groups. A cost threshold can then be selected from the pruning cost distribution, and weight groups having a pruning cost below the selected cost threshold are compressed according to the compression ratio. The remaining weight groups can be compressed using one or more less aggressive compression ratios. The cost threshold can be adjusted to tune the overall sparsity and accuracy of the compressed neural network.
-
公开(公告)号:US11500962B1
公开(公告)日:2022-11-15
申请号:US16917033
申请日:2020-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja
Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.
-
公开(公告)号:US11314842B1
公开(公告)日:2022-04-26
申请号:US16987830
申请日:2020-08-07
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Mohammad El-Shabani , Sundeep Amirineni , Kenneth Wayne Patton , Willis Wang
Abstract: Methods and systems for performing hardware computations of mathematical functions are provided. In one example, a system comprises a mapping table that maps each base value of a plurality of base values to parameters related to a mathematical function; a selection module configured to select, based on an input value, a first base value and first parameters mapped to the first base value in the mapping table; and arithmetic circuits configured to: receive, from the mapping table, the first base value and the first plurality of parameters; and compute, based on a relationship between the input value and the first base value, and based on the first parameters, an estimated output value of the mathematical function for the input value.
-
公开(公告)号:US20210158131A1
公开(公告)日:2021-05-27
申请号:US16698236
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Animesh Jain , Yizhi Liu , Hongbin Zheng , Jeffrey T. Huynh , Haichen Li , Drazen Borkovic , Jindrich Zejda , Richard John Heaton , Randy Renfu Huang , Zhi Chen , Yida Wang
Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.
-
公开(公告)号:US11016775B2
公开(公告)日:2021-05-25
申请号:US16453478
申请日:2019-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Drazen Borkovic , Jindrich Zejda , Randy Renfu Huang , Ron Diamant
Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.
-
公开(公告)号:US12106222B2
公开(公告)日:2024-10-01
申请号:US18112036
申请日:2023-02-21
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja
Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.
-
公开(公告)号:US20230359876A1
公开(公告)日:2023-11-09
申请号:US18352768
申请日:2023-07-14
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.
-
-
-
-
-
-
-
-
-