Neural network controller
    2.
    发明授权

    公开(公告)号:US11429851B1

    公开(公告)日:2022-08-30

    申请号:US16219303

    申请日:2018-12-13

    Applicant: Xilinx, Inc.

    Abstract: Disclosed circuits and methods involve a first register configured to store of a first convolutional neural network (CNN) instruction during processing of the first CNN instruction and a second register configured to store a second CNN instruction during processing of the second CNN instruction. Each of a plurality of address generation circuits is configured to generate one or more addresses in response to an input CNN instruction. Control circuitry is configured to select one of the first CNN instruction or the second CNN instruction as input to the address generation circuits.

    Performing consecutive mac operations on a set of data using different kernels in a MAC circuit

    公开(公告)号:US11429850B2

    公开(公告)日:2022-08-30

    申请号:US16040357

    申请日:2018-07-19

    Applicant: Xilinx, Inc.

    Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.

    Memory arrangement for tensor data

    公开(公告)号:US10346093B1

    公开(公告)日:2019-07-09

    申请号:US15923950

    申请日:2018-03-16

    Applicant: Xilinx, Inc.

    Abstract: Disclosed circuitry includes RAM circuits, a memory controller, and an array of processing circuits. Each RAM circuit includes a read port and a write port. The memory controller accesses tensor data arranged in banks of tensor buffers in the RAM circuits. The memory controller is coupled to each read port by shared read control signal lines and to each write port by shared write control signal lines. The memory controller generates read control and write control signals for accessing different ones of the tensor buffers at different times. The array of processing circuits is coupled to one of the RAM circuits. The array includes multiple rows and multiple of columns of processing circuits for performing tensor operations on the tensor data. The processing circuits in each row in each array of processing circuits are coupled to input the same tensor data.

    STACKED COLUMNAR INTEGRATED CIRCUITS
    5.
    发明申请

    公开(公告)号:US20180083635A1

    公开(公告)日:2018-03-22

    申请号:US15272242

    申请日:2016-09-21

    Applicant: Xilinx, Inc.

    Inventor: Ephrem C. Wu

    Abstract: An example semiconductor device includes a first integrated circuit (IC) die including a first column of cascade-coupled resource blocks; a second IC die including a second column of cascade-coupled resource blocks, where an active side of the second IC die is mounted to an active side of the first IC die; and a plurality of electrical connections between the active side of the first IC and the active side of the second IC, the plurality of electrical connections including at least one electrical connection between the first column of cascade-coupled resource blocks and the second column of cascade-coupled resource blocks.

    Tensor operations and acceleration

    公开(公告)号:US09779786B1

    公开(公告)日:2017-10-03

    申请号:US15334746

    申请日:2016-10-26

    Applicant: Xilinx, Inc.

    Abstract: A system includes global memory circuitry configured to store input tensors and output tensors. Row data paths are each connected to an output port of the memory circuitry. Column data paths are connected to an input port of the memory circuitry. Processing elements are arranged in rows and columns along the row data paths and column data paths, respectively. The processing elements include local memory circuitry configured to store multiple masks and processing circuitry. The processing circuitry is configured to receive portions of the input tensors from one of the row data paths; receive masks from the local memory circuitry; perform multiple tensor operations on a same received portion of an input tensors by applying a different retrieved mask for each tensor operation; and generate, using results of the multiple tensor operations, an output for a corresponding column data path.

    Coding using a combinatorial number system
    7.
    发明授权
    Coding using a combinatorial number system 有权
    使用组合数字系统进行编码

    公开(公告)号:US09378170B1

    公开(公告)日:2016-06-28

    申请号:US13829871

    申请日:2013-03-14

    Applicant: Xilinx, Inc.

    Inventor: Ephrem C. Wu

    CPC classification number: G06F13/4022 G06F13/42

    Abstract: An apparatus relating generally to encoding is disclosed. This apparatus includes a bus interface for communicating information from a first die including the bus interface to a second die. A first portion of a bus associated with the bus interface is associated with data bits. A second portion of the bus associated with the bus interface is associated with encoding bits. The bus interface is configured to encode a data word to provide an encoded word. The encoded word is associated with a combinatorial number system.

    Abstract translation: 公开了一般涉及编码的装置。 该装置包括用于将信息从包括总线接口的第一管芯传送到第二管芯的总线接口。 与总线接口相关联的总线的第一部分与数据位相关联。 与总线接口相关联的总线的第二部分与编码位相关联。 总线接口被配置为对数据字进行编码以提供编码字。 编码字与组合数字系统相关联。

    HARDWARE ACCELERATION OF MACHINE LEARNING DESIGNS

    公开(公告)号:US20230401480A1

    公开(公告)日:2023-12-14

    申请号:US17806906

    申请日:2022-06-14

    Applicant: Xilinx, Inc.

    CPC classification number: G06N20/00

    Abstract: Hardware acceleration of machine learning (ML) designs includes translating an ML primitive into an intermediate representation. The intermediate representation is subdivided to specify a functional compute block. The functional compute block is sized according to a compute node primitive adapted for implementing the ML primitive on target hardware. An overlay is generated for the ML primitive, at least in part, by mapping the functional compute block to the compute node primitive. The overlay is synthesizable to implement the ML primitive on the target hardware. The overlay can be scheduled for operation within the target hardware as part of an ML design including the ML primitive.

    Data transfers between a memory and a distributed compute array

    公开(公告)号:US11127442B2

    公开(公告)日:2021-09-21

    申请号:US16706437

    申请日:2019-12-06

    Applicant: Xilinx, Inc.

    Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.

    Digital signal processing block
    10.
    发明授权

    公开(公告)号:US10673438B1

    公开(公告)日:2020-06-02

    申请号:US16373524

    申请日:2019-04-02

    Applicant: Xilinx, Inc.

    Abstract: A digital signal processor (DSP) slice is disclosed. The DSP slice includes an input stage to receive a plurality of input signals, a pre-adder coupled to the input stage and configured to perform one or more operations on one or more of the plurality of input signals, and a multiplier coupled to the input stage and the pre-adder and configured to perform one or more multiplication operations on one or more of the plurality of input signals or the output of the pre-adder. The DSP slice further includes an arithmetic logic unit (ALU) coupled to the input stage, the pre-adder, and the multiplier. The ALU is configured to perform one or more mathematical or logical operations on one or more of the plurality of input signals, the output of the pre-adder, or the output of the multiplier. The DSP slice also includes an output stage coupled to the ALU, the output stage configured to generate one or more output signals based at least in part on one or more of the outputs of the ALU, or at least one of the plurality of input signals.

Patent Agency Ranking