MACHINE LEARNING RUNTIME LIBRARY FOR NEURAL NETWORK ACCELERATION

    公开(公告)号:US20190114533A1

    公开(公告)日:2019-04-18

    申请号:US15785679

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

    Neural network processing system having multiple processors and a neural network accelerator

    公开(公告)号:US11222256B2

    公开(公告)日:2022-01-11

    申请号:US15785685

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: At least one neural network accelerator performs operations of a first subset of layers of a neural network on an input data set, generates an intermediate data set, and stores the intermediate data set in a shared memory queue in a shared memory. A first processor element of a host computer system provides input data to the neural network accelerator and signals the neural network accelerator to perform the operations of the first subset of layers of the neural network on the input data set. A second processor element of the host computer system reads the intermediate data set from the shared memory queue, performs operations of a second subset of layers of the neural network on the intermediate data set, and generates an output data set while the neural network accelerator is performing the operations of the first subset of layers of the neural network on another input data set.

    Inline image preprocessing for convolution operations using a matrix multiplier on an integrated circuit

    公开(公告)号:US10460416B1

    公开(公告)日:2019-10-29

    申请号:US15786244

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; and control circuitry configured to generate addresses for the plurality of memory banks, control the multiplexer circuitry to select among outputs of the plurality of memory banks, control the first plurality of registers to store outputs of the second plurality of multiplexers, and control the second plurality of registers to store outputs of the first plurality of registers.

    Circuit arrangements and methods for dividing a three-dimensional input feature map

    公开(公告)号:US10411709B1

    公开(公告)日:2019-09-10

    申请号:US16045657

    申请日:2018-07-25

    Applicant: Xilinx, Inc.

    Abstract: Disclosed circuits and methods include N line buffers. Each line buffer is configured for storage of M data elements of a three-dimensional (3-D) input feature map (IFM). A request generator circuit is coupled to the N line buffers and to a memory configured for storage of the 3-D IFM. The request generator circuit is divides the 3-D IFM into a plurality of IFM sub-volumes based on values of N, M, and dimensions of the 3-D IFM. The request generator circuit reads from the memory, data elements at addresses of an unprocessed one of the IFM sub-volumes and stores the data elements of the unprocessed one of the IFM sub-volumes in the N line buffers. In response to a completion signal, the request generator circuit repeats the reading of an unprocessed one of the IFM sub-volumes and storing the data elements in the N line buffers.

    Parallelizing timing-based operations for circuit designs

    公开(公告)号:US10303833B1

    公开(公告)日:2019-05-28

    申请号:US15429014

    申请日:2017-02-09

    Applicant: Xilinx, Inc.

    Abstract: Parallelizing operations for implementing a circuit design can include dividing, using a processor, the circuit design into a plurality of partitions, wherein each partition is stored as a separate file, for each partition, generating, using the processor, a timing arc file specifying boundary delays for the partition, and generating, using the processor, a partition design file specifying interfaces of the partitions. Using the processor, a plurality of processes executing in parallel can be initiated. Each process is adapted to operate on a selected partition using the partition design file and the timing arc files for the other partitions to generate an updated file for the selected partition.

    Selecting predefined circuit implementations in a circuit design system
    9.
    发明授权
    Selecting predefined circuit implementations in a circuit design system 有权
    在电路设计系统中选择预定义的电路实现

    公开(公告)号:US09460253B1

    公开(公告)日:2016-10-04

    申请号:US14482945

    申请日:2014-09-10

    Applicant: Xilinx, Inc.

    Abstract: In an example, a method of processing a circuit design includes: determining a first partition in a description of the circuit design having a hierarchy of design objects, the first partition including at least one design object in the hierarchy of design objects; generating a signature for the first partition; querying a database with the signature of the first partition to identify a plurality of predefined implementations of the first partition; and generating an implementation of the circuit design for a target integrated circuit (IC) based on a selected predefined implementation of the plurality of predefined implementations for the first partition.

    Abstract translation: 在一个示例中,一种处理电路设计的方法包括:在具有设计对象层级的电路设计的描述中确定第一分区,第一分区包括设计对象层级中的至少一个设计对象; 生成第一分区的签名; 用第一分区的签名查询数据库以识别第一分区的多个预定义的实现; 以及基于用于所述第一分区的所述多个预定义实现的所选择的预定义实现来生成用于目标集成电路(IC)的电路设计的实现。

    Data-driven pattern matching in synthesis of circuit designs
    10.
    发明授权
    Data-driven pattern matching in synthesis of circuit designs 有权
    电路设计合成中的数据驱动模式匹配

    公开(公告)号:US08938700B1

    公开(公告)日:2015-01-20

    申请号:US13762251

    申请日:2013-02-07

    Applicant: Xilinx, Inc.

    CPC classification number: G06F17/505

    Abstract: Data-driven processing of a circuit design includes converting each pattern of one or more input patterns from a first format into a second format. Each pattern identifies one or more inputs and one or more outputs and specifies each function that generates each of the one or more outputs from the one or more inputs. Each pattern of the second format is stored in a database. An input circuit design is searched for circuit design elements that match patterns in the database. Data indicative of each pattern in the database that matches a circuit design element is output.

    Abstract translation: 电路设计的数据驱动处理包括将一个或多个输入模式的每个模式从第一格式转换为第二格式。 每个模式识别一个或多个输入和一个或多个输出,并且指定从一个或多个输入产生一个或多个输出中的每一个的每个功能。 第二格式的每个模式都存储在数据库中。 搜索与数据库中的模式匹配的电路设计元素的输入电路设计。 输出指示数据库中与电路设计元素匹配的每个模式的数据。

Patent Agency Ranking