-
公开(公告)号:US12106102B1
公开(公告)日:2024-10-01
申请号:US18221640
申请日:2023-07-13
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic
CPC classification number: G06F9/30036 , G06F1/10 , G06F9/3001 , G06F9/3836 , G06N3/08
Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.
-
公开(公告)号:US12093806B1
公开(公告)日:2024-09-17
申请号:US16459501
申请日:2019-07-01
Applicant: Amazon Technologies, Inc.
Inventor: Jindrich Zejda , Ron Diamant , Jeffrey T. Huynh , Drazen Borkovic , Randy Renfu Huang , Richard John Heaton
Abstract: Static memory allocation may be performed for weight values across multiple processing units executing a neural network. A neural network may be received for execution across multiple processing units. A partitioning scheme may be applied to divide the neural network into subgraphs. The subgraphs may be assigned to different processing units. The weights for the operations of the subgraph may be statically allocated in dedicated caches for the processing units as part of the instructions to execute the neural network across the processing units.
-
公开(公告)号:US11182314B1
公开(公告)日:2021-11-23
申请号:US16698761
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic , Ilya Minkin , Vignesh Vivekraja , Richard John Heaton , Randy Renfu Huang
Abstract: An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.
-
公开(公告)号:US12182688B2
公开(公告)日:2024-12-31
申请号:US16698236
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Animesh Jain , Yizhi Liu , Hongbin Zheng , Jeffrey T. Huynh , Haichen Li , Drazen Borkovic , Jindrich Zejda , Richard John Heaton , Randy Renfu Huang , Zhi Chen , Yida Wang
Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.
-
公开(公告)号:US12159217B1
公开(公告)日:2024-12-03
申请号:US16829331
申请日:2020-03-25
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic
Abstract: Methods for simplifying a dependency graph in a neural network accelerator are provided. Computations and data movements for the neural network accelerator may be described with a flow graph, where graph nodes represent computation or data movement operations and graph edges represent dependencies between operations. A flow graph may contain redundant edges that can be removed while retaining the reachability of each of the nodes in the graph. To identify redundant edges, a compiler may generate vector clocks to track the relationships of operations performed by various execution engines prior to execution of a program reaching a given node or operation. Redundant edges may be identified and removed based on the relative values of the vector clocks to reduce the complexity of the graph.
-
公开(公告)号:US11556342B1
公开(公告)日:2023-01-17
申请号:US17031495
申请日:2020-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Ravi Kumar , Drazen Borkovic
IPC: G06F9/30 , G06F8/40 , G06F8/41 , G06F1/3206 , G06F11/30
Abstract: Techniques are disclosed for utilizing configurable delays in an instruction stream. A set of instructions to be executed on a set of engines are generated. The set of engines are distributed between a set of hardware elements. A set of configurable delays are inserted into the set of instructions. Each of the set of configurable delays includes an adjustable delay amount that delays an execution of the set of instructions on the set of engines. The adjustable delay amount is adjustable by a runtime application that facilitates the execution of the set of instructions on the set of engines. The runtime application is configured to determine a runtime condition associated with the execution of the set of instructions on the set of engines and to adjust the set of configurable delays based on the runtime condition.
-
公开(公告)号:US11467946B1
公开(公告)日:2022-10-11
申请号:US16368351
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Samuel Jacob , Drazen Borkovic , Yu Zhou , Mohammad El-Shabani
Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.
-
公开(公告)号:US11003429B1
公开(公告)日:2021-05-11
申请号:US16266915
申请日:2019-02-04
Applicant: Amazon Technologies, Inc.
Inventor: Jindrich Zejda , Jeffrey T. Huynh , Tobias Joseph Kastulus Edler von Koch , Drazen Borkovic , Taemin Kim
IPC: G06F8/41 , G06F16/901 , G06F15/80
Abstract: Scheduling of the operations of an integrated circuit device such as a hardware accelerator, including scheduling of movement of data into and out of the accelerator, can be performed by a compiler that produces program code for the accelerator. The compiler can produce a graph that represents operations to be performed by the accelerator. Using the graph, the compiler can determine estimated execution times for the operations represented by each node in the graph. The compiler can schedule operations by determining an estimated execution time for set of dependent operations that depend from an operation. The compiler can then select an operation that has a shortest estimated execution time from among a set of operations and which has a set of dependent operations that has a longest estimated execution time as compared to other sets of dependent operations.
-
公开(公告)号:US11775299B1
公开(公告)日:2023-10-03
申请号:US17215912
申请日:2021-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic
CPC classification number: G06F9/30036 , G06F1/10 , G06F9/3001 , G06F9/3836 , G06N3/08
Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.
-
公开(公告)号:US11748622B1
公开(公告)日:2023-09-05
申请号:US16292236
申请日:2019-03-04
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic , Se jong Oh
IPC: G06N3/082 , G06F16/901 , G06N3/063 , G06F8/41
CPC classification number: G06N3/082 , G06F8/433 , G06F16/9024 , G06N3/063
Abstract: A computing system is configured to access intermediate outputs of a neural network by augmenting a data flow graph generated for the neural network. The data flow graph includes a plurality of nodes interconnected by connections, each node representing an operation to be executed by the neural network. To access the intermediate output, the data flow graph is augmented by inserting a node representing an operation that saves the output of a node which produces the intermediate output. The node representing the save operation is inserted while maintaining all existing nodes and connections in the data flow graph, thereby preserving the behavior of the data flow graph. The augmenting can be performed using a compiler that generates the data flow graph from program code.
-
-
-
-
-
-
-
-
-