摘要:
Implementing an asymmetric memory having random port ratios using memory primitives can include detecting, using computer hardware, a hardware description language (HDL) random access memory (RAM) within a circuit design. The HDL RAM is asymmetric. Using computer hardware, a number of a plurality of memory primitives needed to implement the HDL RAM as a RAM circuit are determined based on a maximum port width ratio of the memory primitives defined as 1:N and a port width ratio of the HDL RAM defined as 1:M, wherein each of M and N is an integer and a power of two and M exceeds N. The RAM circuit is asymmetric. Using the computer hardware, a write circuit and/or a read circuit can be generated for a first port of the RAM circuit. Further, using the computer hardware, a write circuit and/or a read circuit can be generated for a second port of the RAM circuit.
摘要:
Integrated circuits and methods relating to hardware acceleration include independent, programmable, and parallel processing units (PU) custom-adapted to process a data stream and aggregate the results to respond to a query. In an illustrative example, a data stream from a database may be divided into data blocks and allocated to a corresponding PU. Each data block may be processed by one of the PUs to generate results according to a predetermined instruction set. A concatenate unit may merge and concatenate a result of each data block together to generate an output result for the query. In some embodiments, very large database SQL queries, for example, may be accelerated by hardware PU/concatenate engines implemented in fixed ASIC or reconfigurable FPGA hardware circuitry.
摘要:
Circuits and method for multiplying floating point operands. An exponent adder circuit sums a first exponent and a second exponent and generates an output exponent. A mantissa multiplier circuit multiplies a first mantissa and a second mantissa and generates an output mantissa. A first conversion circuit converts the output exponent and output mantissa into a fixed point number. An accumulator circuit sums contents of an accumulation register and the fixed point number into an accumulated value and stores the accumulated value in the accumulation register.
摘要:
Compiling a circuit design includes receiving the circuit design specified in a hardware description language, detecting, using a processor, a slice of a vector within the circuit design, and determining that the slice is defined by a left slice boundary variable and a right slice boundary variable. A hardware description is generated from the circuit design using the processor by including a first shifter circuit receiving the left slice boundary variable as an input signal, a second shifter circuit receiving the right slice boundary variable as an input signal, a control signal generator coupled to the first and second shifter circuits, and an output stage. The output stage, responsive to a control signal dependent upon an output from the first shifter circuit and an output from second shifter circuit, generates an output signal including newly received values from a data signal only for bit locations of the output signal corresponding to the slice.
摘要:
Local retiming for a circuit design includes determining, using computer hardware, a load of a synchronous circuit element within the circuit design tagged for forward retiming, traversing, using the computer hardware, each input of the load backward through the circuit design until a sequential circuit element or a primary input is reached, and adding, using the computer hardware, each synchronous circuit element encountered in the traversing to a forward retiming list. In response to determining that forward retiming criteria is met for the forward retiming list, the computer hardware modifies the circuit design by creating a new synchronous circuit element at an output of the load.
摘要:
In an example, a method of processing a circuit design includes: determining a first partition in a description of the circuit design having a hierarchy of design objects, the first partition including at least one design object in the hierarchy of design objects; generating a signature for the first partition; querying a database with the signature of the first partition to identify a plurality of predefined implementations of the first partition; and generating an implementation of the circuit design for a target integrated circuit (IC) based on a selected predefined implementation of the plurality of predefined implementations for the first partition.
摘要:
A memory optimization method includes identifying, within a circuit design, a memory having an arithmetic operator at an output side and/or an input side of the memory. The memory may include a read-only memory (ROM). In some examples, an input of the arithmetic operator includes a constant value. In some embodiments, the memory optimization method further includes absorbing a function of the arithmetic operator into the memory. By way of example, the absorbing the function includes modifying contents of the memory based on the function of the arithmetic operator to provide an updated memory and removing the arithmetic operator from the circuit design.
摘要:
Integrated circuits and methods relating to hardware acceleration include independent, programmable, and parallel processing units (PU) custom-adapted to process a data stream and aggregate the results to respond to a query. In an illustrative example, a data stream from a database may be divided into data blocks and allocated to a corresponding PU. Each data block may be processed by one of the PUs to generate results according to a predetermined instruction set. A concatenate unit may merge and concatenate a result of each data block together to generate an output result for the query. In some embodiments, very large database SQL queries, for example, may be accelerated by hardware PU/concatenate engines implemented in fixed ASIC or reconfigurable FPGA hardware circuitry.
摘要:
Disclosed approaches of pipelining cascaded memory blocks include determining memory blocks combined to implement a memory in a netlist of a circuit design. A model of the memory blocks arranged in a matrix is generated and a total number of delay registers that can be inserted between an input and an output of the memory is determined based on an input latency constraint. For each column, positions of delay registers are determined between an input of the column and the output of the memory. The circuit design is modified to include the delay registers at the determined positions.
摘要:
Verifying a circuit design may include, in response to modification of the circuit design involving at least one of inserting or removing a flip-flop, determining, using computer hardware, latency change values for pins of components of the circuit design, determining, using the computer hardware, total latency for the pins of the components of the circuit design based, at least in part, upon the latency change values, and comparing, using the computer hardware, total latency of the pins of the components of the circuit design. Verifying the circuit design may also include detecting, using the computer hardware, a latency error within the circuit design based upon the comparing and generating, using the computer hardware, a notification of the latency error in the circuit design, wherein the notification specifies a type of the latency error detected.