Patent search ap:("Xilinx Page Inc.") AND inv:"Ephrem C. Wu"

1.

发明公开
INSTRUCTION GENERATION AND PROGRAMMING MODEL FOR A DATA PROCESSING ARRAY AND MICROCONTROLLER 审中-公开

公开(公告)号：US20240069511A1

公开(公告)日：2024-02-29

申请号：US17823902

申请日：2022-08-31

Applicant: Xilinx, Inc.

Inventor： Jorn Tuyls , Xiao Teng , Sanket Pandit , Rajeev Patwari , Qian Zhou , Ehsan Ghasemi , Ephrem C. Wu , Elliott Delaye , Aaron Ng

IPC: G05B19/042

CPC classification number: G05B19/042 , G05B2219/25255 , G05B2219/25257

Abstract: Instruction generation for a data processing array and microcontroller includes generating a tensor-level intermediate representation from a machine learning model using kernel expressions. Statements of the tensor-level intermediate representation are partitioned into a first set of statements and a second set of statements. From the first set of statements, kernel instructions are generated based on a reconfigurable neural engine model. The kernel instructions are executable by a compute tile of a data processing array to implement compute functions of the machine learning model. From the set of second statements, microcontroller instructions are generated based on a super-graph model. The microcontroller instructions are executable by a microcontroller of the data processing array to move data into and out from the data processing array.

2.

发明授权
Neural network controller 有权

公开(公告)号：US11429851B1

公开(公告)日：2022-08-30

申请号：US16219303

申请日：2018-12-13

Applicant: Xilinx, Inc.

Inventor： Xiaoqian Zhang , Ephrem C. Wu , David Berman

IPC: G06N3/063 , G06N3/04 , G06F9/30 , G06F9/34

Abstract: Disclosed circuits and methods involve a first register configured to store of a first convolutional neural network (CNN) instruction during processing of the first CNN instruction and a second register configured to store a second CNN instruction during processing of the second CNN instruction. Each of a plurality of address generation circuits is configured to generate one or more addresses in response to an input CNN instruction. Control circuitry is configured to select one of the first CNN instruction or the second CNN instruction as input to the address generation circuits.

3.

发明授权
Performing consecutive mac operations on a set of data using different kernels in a MAC circuit 有权

公开(公告)号：US11429850B2

公开(公告)日：2022-08-30

申请号：US16040357

申请日：2018-07-19

Applicant: Xilinx, Inc.

Inventor： Xiaoqian Zhang , Ephrem C. Wu , David Berman

IPC: G06N3/063 , G06F7/544 , G06F9/38 , G06F9/54 , G06F12/0875 , G06N3/04

Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.

4.

发明授权
Memory arrangement for tensor data 有权

公开(公告)号：US10346093B1

公开(公告)日：2019-07-09

申请号：US15923950

申请日：2018-03-16

Applicant: Xilinx, Inc.

Inventor： Ephrem C. Wu , Xiaoqian Zhang , David Berman

IPC: G06F12/00 , G06F3/06 , G11C7/10 , G06F12/06 , G06N3/08 , G06N3/04 , G06N3/02

Abstract: Disclosed circuitry includes RAM circuits, a memory controller, and an array of processing circuits. Each RAM circuit includes a read port and a write port. The memory controller accesses tensor data arranged in banks of tensor buffers in the RAM circuits. The memory controller is coupled to each read port by shared read control signal lines and to each write port by shared write control signal lines. The memory controller generates read control and write control signals for accessing different ones of the tensor buffers at different times. The array of processing circuits is coupled to one of the RAM circuits. The array includes multiple rows and multiple of columns of processing circuits for performing tensor operations on the tensor data. The processing circuits in each row in each array of processing circuits are coupled to input the same tensor data.

5.

发明申请
STACKED COLUMNAR INTEGRATED CIRCUITS 审中-公开

公开(公告)号：US20180083635A1

公开(公告)日：2018-03-22

申请号：US15272242

申请日：2016-09-21

Applicant: Xilinx, Inc.

Inventor： Ephrem C. Wu

IPC: H03K19/177 , H01L27/06 , H01L23/00

CPC classification number: H03K19/17748 , H01L24/17 , H01L24/75 , H01L27/0688 , H03K19/0175 , H03K19/17796

Abstract: An example semiconductor device includes a first integrated circuit (IC) die including a first column of cascade-coupled resource blocks; a second IC die including a second column of cascade-coupled resource blocks, where an active side of the second IC die is mounted to an active side of the first IC die; and a plurality of electrical connections between the active side of the first IC and the active side of the second IC, the plurality of electrical connections including at least one electrical connection between the first column of cascade-coupled resource blocks and the second column of cascade-coupled resource blocks.

6.

发明授权
Tensor operations and acceleration 有权

公开(公告)号：US09779786B1

公开(公告)日：2017-10-03

申请号：US15334746

申请日：2016-10-26

Applicant: Xilinx, Inc.

Inventor： Ephrem C. Wu , Inkeun Cho , Xiaoqian Zhang

IPC: G11C7/10

CPC classification number: G11C7/10 , G06K9/00986 , G06K9/4628 , G06K9/6273 , G06N3/04 , G11C7/1006

Abstract: A system includes global memory circuitry configured to store input tensors and output tensors. Row data paths are each connected to an output port of the memory circuitry. Column data paths are connected to an input port of the memory circuitry. Processing elements are arranged in rows and columns along the row data paths and column data paths, respectively. The processing elements include local memory circuitry configured to store multiple masks and processing circuitry. The processing circuitry is configured to receive portions of the input tensors from one of the row data paths; receive masks from the local memory circuitry; perform multiple tensor operations on a same received portion of an input tensors by applying a different retrieved mask for each tensor operation; and generate, using results of the multiple tensor operations, an output for a corresponding column data path.

7.

发明授权
Coding using a combinatorial number system 有权
Title translation: 使用组合数字系统进行编码

公开(公告)号：US09378170B1

公开(公告)日：2016-06-28

申请号：US13829871

申请日：2013-03-14

Applicant: Xilinx, Inc.

Inventor： Ephrem C. Wu

IPC: G06F13/42 , G06F13/40

CPC classification number: G06F13/4022 , G06F13/42

Abstract: An apparatus relating generally to encoding is disclosed. This apparatus includes a bus interface for communicating information from a first die including the bus interface to a second die. A first portion of a bus associated with the bus interface is associated with data bits. A second portion of the bus associated with the bus interface is associated with encoding bits. The bus interface is configured to encode a data word to provide an encoded word. The encoded word is associated with a combinatorial number system.

Abstract translation: 公开了一般涉及编码的装置。该装置包括用于将信息从包括总线接口的第一管芯传送到第二管芯的总线接口。与总线接口相关联的总线的第一部分与数据位相关联。与总线接口相关联的总线的第二部分与编码位相关联。总线接口被配置为对数据字进行编码以提供编码字。编码字与组合数字系统相关联。

8.

发明公开
HARDWARE ACCELERATION OF MACHINE LEARNING DESIGNS 审中-公开

公开(公告)号：US20230401480A1

公开(公告)日：2023-12-14

申请号：US17806906

申请日：2022-06-14

Applicant: Xilinx, Inc.

Inventor： Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Ephrem C. Wu , Xiao Teng , Sanket Pandit

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Hardware acceleration of machine learning (ML) designs includes translating an ML primitive into an intermediate representation. The intermediate representation is subdivided to specify a functional compute block. The functional compute block is sized according to a compute node primitive adapted for implementing the ML primitive on target hardware. An overlay is generated for the ML primitive, at least in part, by mapping the functional compute block to the compute node primitive. The overlay is synthesizable to implement the ML primitive on the target hardware. The overlay can be scheduled for operation within the target hardware as part of an ML design including the ML primitive.

9.

发明授权
Data transfers between a memory and a distributed compute array 有权

公开(公告)号：US11127442B2

公开(公告)日：2021-09-21

申请号：US16706437

申请日：2019-12-06

Applicant: Xilinx, Inc.

Inventor： Xiaoqian Zhang , Ephrem C. Wu , David Berman

IPC: G11C7/10 , G06N3/06 , G11C7/22

Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.

10.

发明授权
Digital signal processing block 有权

公开(公告)号：US10673438B1

公开(公告)日：2020-06-02

申请号：US16373524

申请日：2019-04-02

Applicant: Xilinx, Inc.

Inventor： Adam Elkins , Ephrem C. Wu , John M. Thendean , Adnan Pratama , Yashodhara Parulkar , Xiaoqian Zhang

IPC: G06F7/38 , H03K19/173 , H03K19/17724 , G06F7/575 , H03K19/1776 , G06F7/501 , G06F7/523 , G06F7/48

Abstract: A digital signal processor (DSP) slice is disclosed. The DSP slice includes an input stage to receive a plurality of input signals, a pre-adder coupled to the input stage and configured to perform one or more operations on one or more of the plurality of input signals, and a multiplier coupled to the input stage and the pre-adder and configured to perform one or more multiplication operations on one or more of the plurality of input signals or the output of the pre-adder. The DSP slice further includes an arithmetic logic unit (ALU) coupled to the input stage, the pre-adder, and the multiplier. The ALU is configured to perform one or more mathematical or logical operations on one or more of the plurality of input signals, the output of the pre-adder, or the output of the multiplier. The DSP slice also includes an output stage coupled to the ALU, the output stage configured to generate one or more output signals based at least in part on one or more of the outputs of the ALU, or at least one of the plurality of input signals.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification