-
公开(公告)号:US12073199B2
公开(公告)日:2024-08-27
申请号:US16433786
申请日:2019-06-06
Applicant: Amazon Technologies, Inc.
Inventor: Vignesh Vivekraja , Randy Renfu Huang , Yu Zhou , Ron Diamant , Richard John Heaton
CPC classification number: G06F8/4441 , G06N3/04 , G06N3/10
Abstract: In various implementations, provided are systems and methods for reducing neural network processing. A compiler may generate instructions from source code for a neural network having a repeatable set of operations. The instructions may include a plurality of blocks. The compiler may add an overwrite instruction to the plurality of blocks that, when executed by one or more execution engines, triggers an overwrite action. The overwrite action causes the instructions of subsequent blocks to be overwritten with NOP instructions. The overwrite action is triggered only when a condition is satisfied.
-
公开(公告)号:US11868895B2
公开(公告)日:2024-01-09
申请号:US18154576
申请日:2023-01-13
Applicant: Amazon Technologies, Inc.
Inventor: Randy Renfu Huang , Ron Diamant , Richard John Heaton
Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, dividing the tensor operation into a set of sub-operations, and generating instructions for performing a plurality of sub-operations of the set of sub-operations on respective computing engines of a plurality of computing engines on a same integrated circuit device or on different integrated circuit devices. Each sub-operation of the set of sub-operations generates a portion of a final output of the tensor operation. An inference is made based on a result of a sub-operation of the plurality of sub-operations, or based on results of the plurality of sub-operations.
-
公开(公告)号:US11809849B1
公开(公告)日:2023-11-07
申请号:US17326175
申请日:2021-05-20
Applicant: Amazon Technologies, Inc.
Inventor: Hongbin Zheng , Randy Renfu Huang , Robert Geva
CPC classification number: G06F8/452 , G06F9/3853 , G06F13/28 , G06N3/04
Abstract: In one example, a method performed by a compiler comprises: receiving a dataflow graph of a neural network, the neural network comprising a neural network operator; receiving information of computation resources and memory resources of a neural network hardware accelerator intended to execute the neural network operator; determining, based on the dataflow graph, iterations of an operation on elements of a tensor included in the neural network operator; determining, based on the information, a mapping between the elements of the tensor to addresses in the portion of the local memory, and a number of the iterations of the operation to be included in a batch, wherein the number of the iterations in the batch are to be executed in parallel by the neural network hardware accelerator; and generating a schedule of execution of the batches of the iterations of the operations.
-
公开(公告)号:US11714992B1
公开(公告)日:2023-08-01
申请号:US16219760
申请日:2018-12-13
Applicant: Amazon Technologies, Inc.
Inventor: Richard John Heaton , Randy Renfu Huang , Ron Diamant
IPC: G06F16/00 , G06N3/04 , G06F9/30 , G06F16/901 , G06F9/48
CPC classification number: G06N3/04 , G06F9/4881 , G06F9/30003 , G06F16/9024
Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.
-
公开(公告)号:US11562554B1
公开(公告)日:2023-01-24
申请号:US16949749
申请日:2020-11-12
Applicant: Amazon Technologies, Inc.
Inventor: Abinash Mohanty , Randy Renfu Huang
IPC: G06V10/75 , G06N3/04 , G06V10/22 , G06V10/26 , G06V30/194
Abstract: A technique for improving the computational time for performing a non-maximum suppression operation may include receiving a request to perform a non-maximum suppression operation on a set of candidate predictions of a computing task, and performing a statistical analysis on a set of confidence scores corresponding to the set of candidate predictions to determine a standard deviation of the set of confidence scores. A confidence score threshold can be determined based on the standard deviation. Candidate predictions having a confidence score below the confidence score threshold can then be discarded to form a reduced set of candidate predictions. Additional candidate predictions can be discarded from the reduced set of candidate predictions based on an intersection-over-union overlap metric, and the remaining candidate predictions from the reduced set of candidate predictions can be provided as a result of the non-maximum suppression operation.
-
公开(公告)号:US11294599B1
公开(公告)日:2022-04-05
申请号:US16891438
申请日:2020-06-03
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Sundeep Amirineni , Jeffrey T. Huynh
Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.
-
公开(公告)号:US11182314B1
公开(公告)日:2021-11-23
申请号:US16698761
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic , Ilya Minkin , Vignesh Vivekraja , Richard John Heaton , Randy Renfu Huang
Abstract: An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.
-
公开(公告)号:US10831693B1
公开(公告)日:2020-11-10
申请号:US16145122
申请日:2018-09-27
Applicant: Amazon Technologies, Inc.
Inventor: Randy Renfu Huang , Ron Diamant
Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, an integrated circuit device can include a master port operable to send transactions to a target components of the device. The master port can have point-to-point connections with each of the targets. The master port can be configured with a first address range for a first target, a second address range for a second target, and a multicast address range for both the first and second target. When the master port receive a request with an address that is in the multicast address range, the master port can generate, for the one request, a transaction for each of the first and second transactions.
-
公开(公告)号:US12198041B2
公开(公告)日:2025-01-14
申请号:US18352768
申请日:2023-07-14
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.
-
公开(公告)号:US12079734B1
公开(公告)日:2024-09-03
申请号:US17878824
申请日:2022-08-01
Applicant: Amazon Technologies, Inc.
Inventor: Hongbin Zheng , Randy Renfu Huang , Richard John Heaton
Abstract: Techniques for reducing a compilation time for compiling a neural network are disclosed. A description of a neural network is received by a compiler. A plurality of operators are identified based on the description of the neural network. A plurality of subgraphs are formed, each including one or more operators. For each subgraph, a performance factor is calculated based on a compute usage and a memory usage associated with the operators included in the subgraph. The performance factor is compared to a threshold. Based on the comparison, either the subgraph is classified as a compute bound subgraph and a set of memory optimizations are suppressed or the subgraph is classified as a memory bound subgraph and a set of compute optimizations are suppressed.
-
-
-
-
-
-
-
-
-