-
公开(公告)号:US20190114533A1
公开(公告)日:2019-04-18
申请号:US15785679
申请日:2017-10-17
申请人: Xilinx, Inc.
发明人: Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle
摘要: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.
-
公开(公告)号:US12079158B2
公开(公告)日:2024-09-03
申请号:US17814817
申请日:2022-07-25
申请人: Xilinx, Inc.
发明人: Sanket Pandit , Jorn Tuyls , Xiao Teng , Rajeev Patwari , Ehsan Ghasemi , Elliott Delaye , Aaron Ng
CPC分类号: G06F15/8053 , G06F9/45533
摘要: An integrated circuit includes a plurality of kernels and a virtual machine coupled to the plurality of kernels. The virtual machine is configured to interpret instructions directed to different ones of the plurality of kernels. The virtual machine is configured to control operation of the different ones of the plurality of kernels responsive to the instructions.
-
公开(公告)号:US20240045692A1
公开(公告)日:2024-02-08
申请号:US17818309
申请日:2022-08-08
申请人: Xilinx, Inc.
发明人: Xiao Teng , Tejus Siddagangaiah , Bryan Lozano , Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Aaron Ng , Sanket Pandit , Pramod Peethambaran , Satyaprakash Pareek
CPC分类号: G06F9/3814 , G06F9/467 , G06F9/3004
摘要: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.
-
公开(公告)号:US20230401480A1
公开(公告)日:2023-12-14
申请号:US17806906
申请日:2022-06-14
申请人: Xilinx, Inc.
发明人: Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Ephrem C. Wu , Xiao Teng , Sanket Pandit
IPC分类号: G06N20/00
CPC分类号: G06N20/00
摘要: Hardware acceleration of machine learning (ML) designs includes translating an ML primitive into an intermediate representation. The intermediate representation is subdivided to specify a functional compute block. The functional compute block is sized according to a compute node primitive adapted for implementing the ML primitive on target hardware. An overlay is generated for the ML primitive, at least in part, by mapping the functional compute block to the compute node primitive. The overlay is synthesizable to implement the ML primitive on the target hardware. The overlay can be scheduled for operation within the target hardware as part of an ML design including the ML primitive.
-
公开(公告)号:US11568218B2
公开(公告)日:2023-01-31
申请号:US15786288
申请日:2017-10-17
申请人: Xilinx, Inc.
发明人: Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Ashish Sirasao
摘要: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.
-
公开(公告)号:US11386644B2
公开(公告)日:2022-07-12
申请号:US15786267
申请日:2017-10-17
申请人: Xilinx, Inc.
发明人: Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda
摘要: An example preprocessor circuit includes: a first buffer configured to store rows of image data and output a row thereof; a second buffer, coupled to the first buffer, including storage locations to store respective image samples of the row output by the first buffer; shift registers; an interconnect network including connections, each connection coupling a respective one of the shift registers to more than one of the storage locations, one or more of the storage locations being coupled to more than one of the connections; and a control circuit configured to load the shift registers with the image samples based on the connections and shift the shift registers to output streams of image samples.
-
公开(公告)号:US11204747B1
公开(公告)日:2021-12-21
申请号:US15786395
申请日:2017-10-17
申请人: Xilinx, Inc.
发明人: Jindrich Zejda , Elliott Delaye , Yongjun Wu , Aaron Ng , Ashish Sirasao , Khang K. Dao , Christopher J. Case
摘要: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator that operate on two heterogeneous computing systems. For example, the neural network application may execute on a central processing unit (CPU) in a computing system while the neural network accelerator executes on a FPGA. As a result, when moving a software-hardware boundary between the two heterogeneous systems, changes may be made to both the neural network application (using software code) and to the accelerator (using RTL). The embodiments herein describe a software defined approach where shared interface code is used to express both sides of the interface between the two heterogeneous systems in a single abstraction (e.g., a software class).
-
公开(公告)号:US11106968B1
公开(公告)日:2021-08-31
申请号:US15989075
申请日:2018-05-24
申请人: Xilinx, Inc.
发明人: Ehsan Ghasemi , Elliott Delaye , Ashish Sirasao
IPC分类号: G06N3/04
摘要: A circuit arrangement includes a buffer, a height traversal circuit configured to generate a sequence of IFM height values in response to first control signals, a width traversal circuit configured to generate a sequence of IFM width values in response to second control signals, a control circuit, and an address generation circuit. The control circuit is configured to input an OFM height, an OFM width, a kernel height, and a kernel width; generate the first control signals at times based on the OFM height and the kernel height; and generate the second control signals at times based on the OFM width and the kernel width. The address generation circuit is configured to generate a sequence of addresses based on the sequences of IFM height values and IFM width values, provide the sequence of addresses to the buffer, and enable reading from the buffer.
-
公开(公告)号:US20190114499A1
公开(公告)日:2019-04-18
申请号:US15786267
申请日:2017-10-17
申请人: Xilinx, Inc.
发明人: Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda
摘要: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.
-
10.
公开(公告)号:US11222256B2
公开(公告)日:2022-01-11
申请号:US15785685
申请日:2017-10-17
申请人: Xilinx, Inc.
发明人: Xiao Teng , Aaron Ng , Ashish Sirasao , Elliott Delaye
摘要: At least one neural network accelerator performs operations of a first subset of layers of a neural network on an input data set, generates an intermediate data set, and stores the intermediate data set in a shared memory queue in a shared memory. A first processor element of a host computer system provides input data to the neural network accelerator and signals the neural network accelerator to perform the operations of the first subset of layers of the neural network on the input data set. A second processor element of the host computer system reads the intermediate data set from the shared memory queue, performs operations of a second subset of layers of the neural network on the intermediate data set, and generates an output data set while the neural network accelerator is performing the operations of the first subset of layers of the neural network on another input data set.
-
-
-
-
-
-
-
-
-