Synchronization of concurrent computation engines

    公开(公告)号:US11061654B1

    公开(公告)日:2021-07-13

    申请号:US16217797

    申请日:2018-12-12

    Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.

    HIERARCHICAL PARTITIONING OF OPERATORS

    公开(公告)号:US20210158131A1

    公开(公告)日:2021-05-27

    申请号:US16698236

    申请日:2019-11-27

    Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

    Neural network operation reordering for parallel execution

    公开(公告)号:US11016775B2

    公开(公告)日:2021-05-25

    申请号:US16453478

    申请日:2019-06-26

    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

    Efficient utilization of processing element array

    公开(公告)号:US12198041B2

    公开(公告)日:2025-01-14

    申请号:US18352768

    申请日:2023-07-14

    Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

    DMA synchronization using alternating semaphores

    公开(公告)号:US11847507B1

    公开(公告)日:2023-12-19

    申请号:US17247180

    申请日:2020-12-02

    Inventor: Drazen Borkovic

    CPC classification number: G06F9/52 G06F13/28

    Abstract: Two or more semaphores can be used per queue for synchronization of direct memory access (DMA) transfers between a DMA engine and various computational engines by alternating the semaphores across sequential sets of consecutive DMA transfers in the queue. The DMA engine can increment a first semaphore after performing each DMA transfer of a first set of consecutive DMA transfers and a second semaphore after performing each DMA transfer of a second set of consecutive DMA transfers that is after the first set of consecutive DMA transfers in the queue. Each semaphore can be reset when all the computational engines that are dependent on the respective set of consecutive DMA transfers are done waiting on the given semaphore before performing respective operations. After reset, the first semaphore or the second semaphore can be reused for the next set of consecutive DMA transfers in the queue.

    Reconfigurable neural network processing based on subgraph recognition

    公开(公告)号:US11782706B1

    公开(公告)日:2023-10-10

    申请号:US17361992

    申请日:2021-06-29

    Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

    Neural network operation reordering for parallel execution

    公开(公告)号:US11567778B2

    公开(公告)日:2023-01-31

    申请号:US17243415

    申请日:2021-04-28

    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Patent Agency Ranking