Vector clocks for highly concurrent execution engines

    公开(公告)号:US12106102B1

    公开(公告)日:2024-10-01

    申请号:US18221640

    申请日:2023-07-13

    Inventor: Drazen Borkovic

    CPC classification number: G06F9/30036 G06F1/10 G06F9/3001 G06F9/3836 G06N3/08

    Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.

    Low latency neural network model loading

    公开(公告)号:US11182314B1

    公开(公告)日:2021-11-23

    申请号:US16698761

    申请日:2019-11-27

    Abstract: An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.

    Hierarchical partitioning of operators

    公开(公告)号:US12182688B2

    公开(公告)日:2024-12-31

    申请号:US16698236

    申请日:2019-11-27

    Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

    Using vector clocks to simplify a dependency graph in a neural network accelerator

    公开(公告)号:US12159217B1

    公开(公告)日:2024-12-03

    申请号:US16829331

    申请日:2020-03-25

    Inventor: Drazen Borkovic

    Abstract: Methods for simplifying a dependency graph in a neural network accelerator are provided. Computations and data movements for the neural network accelerator may be described with a flow graph, where graph nodes represent computation or data movement operations and graph edges represent dependencies between operations. A flow graph may contain redundant edges that can be removed while retaining the reachability of each of the nodes in the graph. To identify redundant edges, a compiler may generate vector clocks to track the relationships of operations performed by various execution engines prior to execution of a program reaching a given node or operation. Redundant edges may be identified and removed based on the relative values of the vector clocks to reduce the complexity of the graph.

    Configurable delay insertion in compiled instructions

    公开(公告)号:US11556342B1

    公开(公告)日:2023-01-17

    申请号:US17031495

    申请日:2020-09-24

    Abstract: Techniques are disclosed for utilizing configurable delays in an instruction stream. A set of instructions to be executed on a set of engines are generated. The set of engines are distributed between a set of hardware elements. A set of configurable delays are inserted into the set of instructions. Each of the set of configurable delays includes an adjustable delay amount that delays an execution of the set of instructions on the set of engines. The adjustable delay amount is adjustable by a runtime application that facilitates the execution of the set of instructions on the set of engines. The runtime application is configured to determine a runtime condition associated with the execution of the set of instructions on the set of engines and to adjust the set of configurable delays based on the runtime condition.

    Breakpoints in neural network accelerator

    公开(公告)号:US11467946B1

    公开(公告)日:2022-10-11

    申请号:US16368351

    申请日:2019-03-28

    Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.

    Compile-time scheduling
    8.
    发明授权

    公开(公告)号:US11003429B1

    公开(公告)日:2021-05-11

    申请号:US16266915

    申请日:2019-02-04

    Abstract: Scheduling of the operations of an integrated circuit device such as a hardware accelerator, including scheduling of movement of data into and out of the accelerator, can be performed by a compiler that produces program code for the accelerator. The compiler can produce a graph that represents operations to be performed by the accelerator. Using the graph, the compiler can determine estimated execution times for the operations represented by each node in the graph. The compiler can schedule operations by determining an estimated execution time for set of dependent operations that depend from an operation. The compiler can then select an operation that has a shortest estimated execution time from among a set of operations and which has a set of dependent operations that has a longest estimated execution time as compared to other sets of dependent operations.

    Vector clocks for highly concurrent execution engines

    公开(公告)号:US11775299B1

    公开(公告)日:2023-10-03

    申请号:US17215912

    申请日:2021-03-29

    Inventor: Drazen Borkovic

    CPC classification number: G06F9/30036 G06F1/10 G06F9/3001 G06F9/3836 G06N3/08

    Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.

    Saving intermediate outputs of a neural network

    公开(公告)号:US11748622B1

    公开(公告)日:2023-09-05

    申请号:US16292236

    申请日:2019-03-04

    CPC classification number: G06N3/082 G06F8/433 G06F16/9024 G06N3/063

    Abstract: A computing system is configured to access intermediate outputs of a neural network by augmenting a data flow graph generated for the neural network. The data flow graph includes a plurality of nodes interconnected by connections, each node representing an operation to be executed by the neural network. To access the intermediate output, the data flow graph is augmented by inserting a node representing an operation that saves the output of a node which produces the intermediate output. The node representing the save operation is inserted while maintaining all existing nodes and connections in the data flow graph, thereby preserving the behavior of the data flow graph. The augmenting can be performed using a compiler that generates the data flow graph from program code.

Patent Agency Ranking