Abstract:
An example a method of optimizing a neural network having a plurality of layers includes: obtaining an architecture constraint for circuitry of an inference platform that implements the neural network; training the neural network on a training platform to generate network parameters and feature maps for the plurality of layers; and constraining the network parameters, the feature maps, or both based on the architecture constraint.
Abstract:
An example method of training a neural network includes defining hardware building blocks (HBBs), neuron equivalents (NEQs), and conversion procedures from NEQs to HBBs; defining the neural network using the NEQs in a machine learning framework; training the neural network on a training platform; and converting the neural network as trained into a netlist of HBBs using the conversion procedures to convert the NEQs in the neural network to the HBBs of the netlist.
Abstract:
A circular buffer architecture includes a memory coupled to a producer circuit and a consumer circuit. The memory is configured to store objects. The memory can include memory banks. The number of the memory banks is less than a number of the objects. The circular buffer can include hardware locks configured to reserve selected ones of the memory banks for use by the producer circuit or the consumer circuit. The circular buffer can include a buffer controller coupled to the memory and configured to track a plurality of positions. The positions can include a consumer bank position, a consumer object position, a producer bank position, and a producer object position. The buffer controller is configured to allocate selected ones of the objects from the memory banks to the producer circuit and to the consumer circuit according to the tracked positions and using the hardware locks.
Abstract:
A device may include a plurality of data processing engines. Each of the data processing engines may include a core and a memory module. The plurality of data processing engines may be organized in a plurality of rows. Each core may be configured to communicate with other neighboring data processing engines of the plurality of data processing engines by shared access to the memory modules of the neighboring data processing engines.
Abstract:
A circuit for controlling the operation of a memory system having different types of memory is described. The circuit comprises a first memory having a first type of memory element and having a first access time; a second memory having a second type of memory element and having a second access time, wherein the second type of memory element is different than the first type of memory element; a memory control circuit enabling access to the first memory and the second memory; a delay buffer coupled to the second memory to compensate for a difference in the first access time and the second access time; and a circuit for merging outputs of the first memory and delayed outputs of the second memory to generate ordered output data. A method of controlling the operation of a memory system is also disclosed.
Abstract:
A programmable IC includes a plurality of programmable resources, a plurality of shareable logic circuits coupled to the plurality of programmable resources, and a virtualization circuit. The plurality of programmable resources includes programmable logic circuits and programmable routing resources. The virtualization circuit is configured to manage sharing of the plurality of shareable logic circuits between a plurality of user designs implemented in the plurality of programmable resources. The user designs are communicatively isolated from one another on the programmable IC.
Abstract:
A circuit for processing data is described. The circuit comprises an input for receiving a request for implementing a key-value store data transaction; a plurality of memory interfaces associated with different memory types enabling access to a plurality of memory devices associated with a key-value store; and a memory management circuit controlling the routing of data by way of the plurality of memory interfaces based upon a data transfer criterion.
Abstract:
A circular buffer architecture includes a memory coupled to a producer circuit and a consumer circuit. The memory is configured to store objects. The memory can include memory banks. The number of the memory banks is less than a number of the objects. The circular buffer can include hardware locks configured to reserve selected ones of the memory banks for use by the producer circuit or the consumer circuit. The circular buffer can include a buffer controller coupled to the memory and configured to track a plurality of positions. The positions can include a consumer bank position, a consumer object position, a producer bank position, and a producer object position. The buffer controller is configured to allocate selected ones of the objects from the memory banks to the producer circuit and to the consumer circuit according to the tracked positions and using the hardware locks.
Abstract:
A device includes a data processing engine array having a plurality of data processing engines organized in a grid having a plurality of rows and a plurality of columns. Each data processing engine includes a core, a memory module including a memory and a direct memory access engine. Each data processing engine includes a stream switch connected to the core, the direct memory access engine, and the stream switch of one or more adjacent data processing engines. Each memory module includes a first memory interface directly coupled to the core in the same data processing engine and one or more second memory interfaces directly coupled to the core of each of the one or more adjacent data processing engines.
Abstract:
A device may include a plurality of data processing engines. Each of the data processing engines may include a memory pool having a plurality of memory banks, a plurality of cores each coupled to the memory pool and configured to access the plurality of memory banks, a memory mapped switch coupled to the memory pool and a memory mapped switch of at least one neighboring data processing engine, and a stream switch coupled to each of the plurality of cores and to a stream switch of the at least one neighboring data processing engine.