-
公开(公告)号:US11645075B1
公开(公告)日:2023-05-09
申请号:US17305152
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Barak Wasserstrom , Adi Habusha , Ron Diamant , Erez Sabbag
CPC classification number: G06F9/30058 , G06F9/3836 , G06F9/45558 , G06K9/6256 , G06N3/08
Abstract: Execution flows of a program can be characterized by a series of execution events. The rates at which these execution events occur for a particular program can be collected periodically, and the execution events statistics can be utilized for both training a machine learning model, and later on for making classification inferences to determine whether a program run contains any abnormality. When an abnormality is encountered, an alert can be generated and provided to supervisory logic of a computing system to indicate that an abnormal program flow has been detected.
-
公开(公告)号:US11500962B1
公开(公告)日:2022-11-15
申请号:US16917033
申请日:2020-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja
Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.
-
公开(公告)号:US11494326B1
公开(公告)日:2022-11-08
申请号:US17301273
申请日:2021-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant
Abstract: To perform complex arithmetic operations in neural networks without compromising the performance of the neural network accelerator, a programmable computation unit is integrated with a direct memory access (DMA) engine that is used to exchange neural network parameters between the neural network accelerator and system memory. The DMA engine may include a calculation circuit operable to perform a multiply-and-add calculation on a set of operands, and an operand selector circuit operable to select a source for each operand of the calculation circuit. The DMA engine may also include a control circuit operable to retrieve a meta-descriptor for performing a computation, configure the operand selector circuit based on the meta-descriptor, and use the calculation circuit to perform the computation based on the meta-descriptor to generate a computation result.
-
公开(公告)号:US11461631B2
公开(公告)日:2022-10-04
申请号:US15933225
申请日:2018-03-22
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang
IPC: G06N3/08
Abstract: Disclosed herein are techniques for scheduling and executing multi-layer neural network computations for multiple contexts. In one embodiment, a method comprises determining a set of computation tasks to be executed, the set of computation tasks including a first computation task and a second computation task, as well as a third computation task and a fourth computation task to provide input data for the first and second computation tasks; determining a first execution batch comprising the first and second computation tasks; determining a second execution batch comprising at least the third computation task to be executed before the first execution batch; determining whether to include the fourth computation task in the second execution batch based on whether the memory device has sufficient capacity to hold input data and output data of both of the third and fourth computation; executing the second execution batch followed by the first execution batch.
-
公开(公告)号:US11435941B1
公开(公告)日:2022-09-06
申请号:US16911127
申请日:2020-06-24
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Paul Gilbert Meyer , Ron Diamant
Abstract: In one example, an apparatus comprises: a memory array having an array of memory elements arranged in rows and columns, each memory element being configured to store a data element; and a memory access circuit configured to: perform a row write operation to store a first group of data elements at a first row of the array of memory elements; perform a column read operation at a first column of the array of memory elements to obtain a second group of data elements; and perform a column write operation to store a third group of data elements at the first column of the array of memory elements to replace the second group of data elements.
-
公开(公告)号:US11314842B1
公开(公告)日:2022-04-26
申请号:US16987830
申请日:2020-08-07
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Mohammad El-Shabani , Sundeep Amirineni , Kenneth Wayne Patton , Willis Wang
Abstract: Methods and systems for performing hardware computations of mathematical functions are provided. In one example, a system comprises a mapping table that maps each base value of a plurality of base values to parameters related to a mathematical function; a selection module configured to select, based on an input value, a first base value and first parameters mapped to the first base value in the mapping table; and arithmetic circuits configured to: receive, from the mapping table, the first base value and the first plurality of parameters; and compute, based on a relationship between the input value and the first base value, and based on the first parameters, an estimated output value of the mathematical function for the input value.
-
公开(公告)号:US11175919B1
公开(公告)日:2021-11-16
申请号:US16219610
申请日:2018-12-13
Applicant: Amazon Technologies, Inc.
Inventor: Ilya Minkin , Ron Diamant , Drazen Borkovic , Jindrich Zejda , Dana Michelle Vantrease
Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.
-
公开(公告)号:US20210303988A1
公开(公告)日:2021-09-30
申请号:US16835161
申请日:2020-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Patricio Kaplan , Ron Diamant
Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.
-
公开(公告)号:US11119787B1
公开(公告)日:2021-09-14
申请号:US16368263
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Mohammad El-Shabani , Ron Diamant , Samuel Jacob , Ilya Minkin , Richard John Heaton
IPC: G06F9/44 , G06F8/41 , G06F11/30 , G06F9/38 , G06F11/22 , G06F9/455 , G06F11/36 , G06F9/445 , G06F11/34 , G06F9/30
Abstract: Systems and methods for non-intrusive hardware profiling are provided. In some cases integrated circuit devices can be manufactured without native support for performance measurement and/or debugging capabilities, thereby limiting visibility into the integrated circuit device. Understanding the timing of operations can help to determine whether the hardware of the device is operating correctly and, when the device is not operating correctly, provide information that can be used to debug the device. In order to measure execution time of various tasks performed by the integrated circuit device, program instructions may be inserted to generate notifications that provide tracing information, including timestamps, for operations executed by the integrated circuit device.
-
公开(公告)号:US11061654B1
公开(公告)日:2021-07-13
申请号:US16217797
申请日:2018-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic , Jindrich Zejda , Taemin Kim , Ron Diamant
IPC: G06F9/44 , G06F8/41 , G06F9/30 , G06N3/02 , G06F12/1081
Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.
-
-
-
-
-
-
-
-
-