Allocation and placement of resources for network computation

    公开(公告)号:US11561833B1

    公开(公告)日:2023-01-24

    申请号:US16021866

    申请日:2018-06-28

    Abstract: Techniques for operating a computing system to perform neural network operations are disclosed. In one example, a method comprises receiving a neural network model, determining a sequence of neural network operations based on data dependency in the neural network model, and determining a set of instructions to map the sequence of neural network operations to the processing resources of the neural network processor. The method further comprises determining, based on a set of memory access operations included in the set of instructions, a first set of memory references associated with a first location of an external memory to store the input data and a second set of memory references associated with a second location of the external memory to store the output data, and generating an instruction file including the set of instructions, the first set of memory references and the second set of memory references.

    Event assignment for synchronization of concurrent execution engines

    公开(公告)号:US11442794B1

    公开(公告)日:2022-09-13

    申请号:US16585575

    申请日:2019-09-27

    Inventor: Drazen Borkovic

    Abstract: Techniques for synchronizing operations of execution engines of an integrated circuit device are disclosed. A description of a plurality of operations to be performed by the execution engines may be obtained. The plurality of operations may be connected through a plurality of edges. A dependency vector may be generated for each operation of the plurality of operations. The dependency vector of a corresponding operation may include a set of values that are calculated based on the set of values of one or more dependency vectors calculated for one or more immediately preceding operations of the plurality of operations. An event register of a plurality of event registers may be assigned, for each edge of one or more of the plurality of edges, to the corresponding edge based on the dependency vector generated for a start operation associated with the corresponding edge.

    Synchronization of DMA transfers for large number of queues

    公开(公告)号:US11221979B1

    公开(公告)日:2022-01-11

    申请号:US17247016

    申请日:2020-11-24

    Inventor: Drazen Borkovic

    Abstract: Synchronization of a plurality of aggregate DMA transfers on large number of DMA queues can be achieved using a small number of semaphores. One or more semaphores from M semaphores can be assigned to each aggregate DMA transfer based on round-robin or another suitable method. Each aggregate DMA transfer can comprise N DMA transfers, where M is smaller than N. Each DMA transfer can be assigned to one of the assigned one or more semaphores from the M semaphores. Each DMA engine of N DMA engines can increment the assigned semaphore after performing a respective DMA transfer of the N DMA transfers. A computational engine waiting on completion of a certain aggregate DMA transfer can perform an operation based upon the one or more assigned semaphores reaching respective threshold values.

    EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

    公开(公告)号:US20210158132A1

    公开(公告)日:2021-05-27

    申请号:US16698461

    申请日:2019-11-27

    Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

    FAST CONTEXT SWITCHING FOR COMPUTATIONAL NETWORKS

    公开(公告)号:US20190179795A1

    公开(公告)日:2019-06-13

    申请号:US15839157

    申请日:2017-12-12

    Abstract: Provided are systems, methods, and integrated circuits neural network processor that can execute a fast context switch between one neural network and another. In various implementations, a neural network processor can include a plurality of memory banks storing a first set of weight values for a first neural network. When the neural network processor receives first input data, the neural network processor can compute a first result using the first set of weight values and the first input data. While computing the first result, the neural network processor can store, in the memory banks, a second set of weight values for a second neural network. When the neural network processor receives second input data, the neural network processor can compute a second result using the second set of weight values and the second input data, where the computation occurs upon completion of computation of the first result.

    Breakpoints in neural network accelerator

    公开(公告)号:US12210438B1

    公开(公告)日:2025-01-28

    申请号:US17947949

    申请日:2022-09-19

    Abstract: Techniques are disclosed for setting a breakpoint for debugging a neural network. User input is received by a debugger program executable by a host processor indicating a target layer of a neural network at which to halt execution of the neural network. The neural network includes a first set of instructions to be executed by a first execution engine and a second set of instructions to be executed by a second execution engine. A first halt point is set within the first set of instructions and a second halt point is set within the second set of instructions. It is then determined that operation of the first execution engine and the second execution engine has halted. It is then determined that the first execution engine has reached the first halt point. The second execution engine is then caused to move through instructions until reaching the second halt point.

    Reconfigurable neural network processing based on subgraph recognition

    公开(公告)号:US12045611B1

    公开(公告)日:2024-07-23

    申请号:US18231024

    申请日:2023-08-07

    Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

    EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

    公开(公告)号:US20230359876A1

    公开(公告)日:2023-11-09

    申请号:US18352768

    申请日:2023-07-14

    CPC classification number: G06N3/063 G06N3/04

    Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

    Time-based memory allocation for neural network inference

    公开(公告)号:US11610102B1

    公开(公告)日:2023-03-21

    申请号:US16698425

    申请日:2019-11-27

    Abstract: Techniques for time-based memory allocation for a neural network inference are disclosed. A description of a neural network comprising a plurality of operations to be executed across a set of accelerators is received. A plurality of interconnect times at a plurality of partition points within the neural network are calculated. Each of the plurality of interconnect times corresponds to a duration of time for transferring an output feature map from one of the set of accelerators to another of the set of accelerators to be used as an input feature map. A partitioning scheme that divides the plurality of operations into a set of subgraphs is determined based on the plurality of interconnect times. Each of the set of subgraphs is assigned to a different accelerator of the set of accelerators in accordance with the partitioning scheme.

    Efficient race-condition detection
    30.
    发明授权

    公开(公告)号:US11354130B1

    公开(公告)日:2022-06-07

    申请号:US16824404

    申请日:2020-03-19

    Inventor: Drazen Borkovic

    Abstract: Techniques for detecting a data race condition between multiple execution engines of an integrated circuit device are provided. Computations and data movements involving execution engines of an integrated circuit may be described with a flow graph, where graph nodes represent computation or data movement operations and graph edges represent dependencies between the operations. When a graph has incorrect dependencies, data races may result. To detect data race conditions, compiler-generated vector clocks that track the relationships of operations performed by various execution engines may be used to determine concurrent operations between nodes of different execution engines, and memory access patterns for the operations may be compared to determine if the concurrent operations access the same memory address.

Patent Agency Ranking