Patent search ap:("Amazon Technologies Page Inc") AND inv:"Ron Diamant"

11.

发明授权
Program flow classification 有权

公开(公告)号：US11645075B1

公开(公告)日：2023-05-09

申请号：US17305152

申请日：2021-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Barak Wasserstrom , Adi Habusha , Ron Diamant , Erez Sabbag

IPC: G06F9/30 , G06F9/455 , G06N3/08 , G06K9/62 , G06F9/38

CPC classification number: G06F9/30058 , G06F9/3836 , G06F9/45558 , G06K9/6256 , G06N3/08

Abstract: Execution flows of a program can be characterized by a series of execution events. The rates at which these execution events occur for a particular program can be collected periodically, and the execution events statistics can be utilized for both training a machine learning model, and later on for making classification inferences to determine whether a program run contains any abnormality. When an abnormality is encountered, an alert can be generated and provided to supervisory logic of a computing system to indicate that an abnormal program flow has been detected.

12.

发明授权
Emulating fine-grained sparsity in a systolic array 有权

公开(公告)号：US11500962B1

公开(公告)日：2022-11-15

申请号：US16917033

申请日：2020-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Paul Gilbert Meyer , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja

IPC: G06F17/16 , G06N3/04

Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.

13.

发明授权
Programmable computations in direct memory access engine 有权

公开(公告)号：US11494326B1

公开(公告)日：2022-11-08

申请号：US17301273

申请日：2021-03-30

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Ron Diamant

IPC: G06F13/28 , G06F7/544 , G06N3/063 , G06N3/06

Abstract: To perform complex arithmetic operations in neural networks without compromising the performance of the neural network accelerator, a programmable computation unit is integrated with a direct memory access (DMA) engine that is used to exchange neural network parameters between the neural network accelerator and system memory. The DMA engine may include a calculation circuit operable to perform a multiply-and-add calculation on a set of operands, and an operand selector circuit operable to select a source for each operand of the calculation circuit. The DMA engine may also include a control circuit operable to retrieve a meta-descriptor for performing a computation, configure the operand selector circuit based on the meta-descriptor, and use the calculation circuit to perform the computation based on the meta-descriptor to generate a computation result.

14.

发明授权
Scheduling neural network computations based on memory capacity 有权

公开(公告)号：US11461631B2

公开(公告)日：2022-10-04

申请号：US15933225

申请日：2018-03-22

Applicant: Amazon Technologies, Inc.

Inventor： Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang

IPC: G06N3/08

Abstract: Disclosed herein are techniques for scheduling and executing multi-layer neural network computations for multiple contexts. In one embodiment, a method comprises determining a set of computation tasks to be executed, the set of computation tasks including a first computation task and a second computation task, as well as a third computation task and a fourth computation task to provide input data for the first and second computation tasks; determining a first execution batch comprising the first and second computation tasks; determining a second execution batch comprising at least the third computation task to be executed before the first execution batch; determining whether to include the fourth computation task in the second execution batch based on whether the memory device has sufficient capacity to hold input data and output data of both of the third and fourth computation; executing the second execution batch followed by the first execution batch.

15.

发明授权
Matrix transpose hardware acceleration 有权

公开(公告)号：US11435941B1

公开(公告)日：2022-09-06

申请号：US16911127

申请日：2020-06-24

Applicant: Amazon Technologies, Inc.

Inventor： Kun Xu , Paul Gilbert Meyer , Ron Diamant

IPC: G06F3/06 , G06N5/04 , G06N3/02

Abstract: In one example, an apparatus comprises: a memory array having an array of memory elements arranged in rows and columns, each memory element being configured to store a data element; and a memory access circuit configured to: perform a row write operation to store a first group of data elements at a first row of the array of memory elements; perform a column read operation at a first column of the array of memory elements to obtain a second group of data elements; and perform a column write operation to store a third group of data elements at the first column of the array of memory elements to replace the second group of data elements.

16.

发明授权
Hardware implementation of mathematical functions 有权

公开(公告)号：US11314842B1

公开(公告)日：2022-04-26

申请号：US16987830

申请日：2020-08-07

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Randy Renfu Huang , Mohammad El-Shabani , Sundeep Amirineni , Kenneth Wayne Patton , Willis Wang

IPC: G06F17/14 , G06F7/544 , G06F16/84 , G06F7/52 , G06F7/50

Abstract: Methods and systems for performing hardware computations of mathematical functions are provided. In one example, a system comprises a mapping table that maps each base value of a plurality of base values to parameters related to a mathematical function; a selection module configured to select, based on an input value, a first base value and first parameters mapped to the first base value in the mapping table; and arithmetic circuits configured to: receive, from the mapping table, the first base value and the first plurality of parameters; and compute, based on a relationship between the input value and the first base value, and based on the first parameters, an estimated output value of the mathematical function for the input value.

17.

发明授权
Synchronization of concurrent computation engines 有权

公开(公告)号：US11175919B1

公开(公告)日：2021-11-16

申请号：US16219610

申请日：2018-12-13

Applicant: Amazon Technologies, Inc.

Inventor： Ilya Minkin , Ron Diamant , Drazen Borkovic , Jindrich Zejda , Dana Michelle Vantrease

IPC: G06F9/30 , G06F9/35 , G06F13/28 , G06F9/38 , G06F9/52 , G06N3/06

Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.

18.

发明申请
MULTI-MODEL TRAINING PIPELINE IN DISTRIBUTED SYSTEMS 有权

公开(公告)号：US20210303988A1

公开(公告)日：2021-09-30

申请号：US16835161

申请日：2020-03-30

Applicant: Amazon Technologies, Inc.

Inventor： Patricio Kaplan , Ron Diamant

IPC: G06N3/08 , G06N3/04

Abstract: A first worker node of a distributed system computes a first set of gradients using a first neural network model and a first set of weights associated with the first neural network model. The first set of gradients are transmitted from the first worker node to a second worker node of the distributed system. The second worker node computes a first set of synchronized gradients based on the first set of gradients. While the first set of synchronized gradients are being computed, the first worker node computes a second set of gradients using a second neural network model and a second set of weights associated with the second neural network model. The second set of gradients are transmitted from the first worker node to the second worker node. The second worker node computes a second set of synchronized gradients based on the second set of gradients.

19.

发明授权
Non-intrusive hardware profiling 有权

公开(公告)号：US11119787B1

公开(公告)日：2021-09-14

申请号：US16368263

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Mohammad El-Shabani , Ron Diamant , Samuel Jacob , Ilya Minkin , Richard John Heaton

IPC: G06F9/44 , G06F8/41 , G06F11/30 , G06F9/38 , G06F11/22 , G06F9/455 , G06F11/36 , G06F9/445 , G06F11/34 , G06F9/30

Abstract: Systems and methods for non-intrusive hardware profiling are provided. In some cases integrated circuit devices can be manufactured without native support for performance measurement and/or debugging capabilities, thereby limiting visibility into the integrated circuit device. Understanding the timing of operations can help to determine whether the hardware of the device is operating correctly and, when the device is not operating correctly, provide information that can be used to debug the device. In order to measure execution time of various tasks performed by the integrated circuit device, program instructions may be inserted to generate notifications that provide tracing information, including timestamps, for operations executed by the integrated circuit device.

20.

发明授权
Synchronization of concurrent computation engines 有权

公开(公告)号：US11061654B1

公开(公告)日：2021-07-13

申请号：US16217797

申请日：2018-12-12

Applicant: Amazon Technologies, Inc.

Inventor： Drazen Borkovic , Jindrich Zejda , Taemin Kim , Ron Diamant

IPC: G06F9/44 , G06F8/41 , G06F9/30 , G06N3/02 , G06F12/1081

Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification