摘要:
A compute-in-memory device may include a Booth encoder configured to receive at least one input of first bits, a Booth decoder configured to receive at least one weight of second bits and to output a plurality of partial products of the at least one input and the at least one weight, an adder configured to add a first partial product of the plurality of the partial products and a second partial product of the plurality of partial products before the Booth decoder generates a third partial product of the plurality of the partial products and to generate a plurality of sums of partial products, and a carry-lookahead adder configured to add the plurality of sums of partial products and to generate a final sum.
摘要:
A memory device and an operation method thereof are provided. The operation method includes: encoding an input data, sending an encoded input data to at least one page buffer, and reading out the encoded input data in parallel; encoding a first part and a second part of a weight data into an encoded first part and an encoded second part of the weight data, respectively, writing the encoded first part and the encoded second part of the weight data into a plurality of memory cells of the memory device, and reading out the encoded first part and the encoded second part of the weight data in parallel; multiplying the encoded input data with the encoded first part and the encoded second part of the weight data respectively to parallel generate a plurality of partial products; and accumulating the partial products to generate an operation result.
摘要:
The invention is notably directed to a computing system configured to perform linear algebraic operations. The computing system comprises a co-processing module comprising a co-processing unit. The co-processing unit comprises a parallel array of bit-serial processing units. The bit-serial processing units are adapted to perform the linear algebraic operations with variable precision. The invention further concerns a related computer implemented method and a related computer program product.
摘要:
An improved apparatus and method for modular multiplication and exponentiation to achieve efficient computation involved in Montgomery multiplication is provided. Currently employed conventional iteration methods involve carry look-ahead additions. To overcome the time taken by carry look-ahead additions, there is thus provided, in accordance with a preferred embodiment of the present invention, an apparatus and method for separately storing and tracking the sum and the carry of the addition involved in Montgomery multiplication. In such a manner, the present invention achieves fast addition times since they are not dependent on the time to compute the carries. As a result, the iterations are carried out much faster than previously possible. By representing the value A in the Montgomery multiplication algorithm with a redundant notation, the sum and the carry of the addition are separately stored and tracked, thereby avoiding the delays involved in the computation of the carries. In such a manner, by separately storing and tracking the sum and the carry of the addition, this carry-save addition enables a much faster computation involved in Montgomery multiplication.
摘要:
Herein disclosed is a microcomputer MCU adopting the general purpose register method. The microcomputer is enabled to have a small program capacity or a high program memory using efficiency and a low system cost, while enjoying the advantage of simplification of the instruction decoding as in the RISC machine having a fixed length instruction format of the prior art, by adopting a fixed length instruction format having a power of 2 but a smaller bit number than that of the maximum data word length fed to instruction execution means. And, the control of the coded division is executed by noting the code bits.
摘要:
A one-bit adder includes a carry stage and an adding stage and is constructed in a fast CMOS complementary pass transistor logic with complementary analog CMOS switches in the adding stage which consist of a PMOS and an NMOS transistor. The source of the PMOS transistor is connected with the drain of the NMOS transistor and the drain of the PMOS transistor is connected with the source of the NMOS transistor. The gate of the PMOS transistor receives inverted signals with respect to the gate of the NMOS transistor. Two partial output sum signals are generated by two of the switches which are connected with the input and with the output, respectively, of an inverter and the output sum signal of the adder is available at the output of the inverter.A fast multiplier includes (i) a plurality of the above fast one-bit adders, (ii) reduction of partial products by application of a Booth-McSorley process, (iii) diagonal propagation of caries from one partial product to another allowing all sums on one line to be done simultaneously, and (iv) application of a carry select approach in the final 14 bits and in the first two adders in intermediate rows.
摘要:
A multiplier of order p and of depth n+1 is formed by a root R constituted by a carry-save adder and by a multiplier body CO(p,n) of order p and of depth n formed by a five-input connector operator C(n,q) of rank q, the connector operator C(n,1) of rank 1 is connected to the root R, the connector operator C(n,q) of rank q comprising first and second carry-save adders (1, 2) connected in cascade. The multiplier body CO(p,n) further includes a tree A(p-1,n-2) of order p-1 and of depth n-2 formed by an arrangement of carry-save adders and connected to the first carry-save adder (1), and a multiplier body CO(p,n-1 ) of order p and of lesser depth n-1 formed analogously to the multiplier body CO(p,n) of greater depth n by recurrence, the multiplier body CO(p,n-1) of lesser depth being connected to the connector operator C(n,q). The multiplier is applicable to performing calculations and to implementing digital filters.
摘要:
A method for generating a hardware description of a multiplier/multiplier-adder for integrating a signal processing circuit includes the steps of acquiring input parameters such as a word length of multiplier factor, generating a first hardware description of a first add for adding partial products and an inputs addend, determining a redundancy index r by using the input parameters, generating a second hardware description of a second add circuit for performing a carry-add of every r bits of the output of the first add circuit, and replacing useless circuits from the hardware descriptions.
摘要:
A dynamic mousetrap logic gate implements a self-timed monotonic logic progression via a novel vector logic method. In the vector logic method, a vector logic variable is defined by a plurality of vector components situated on respective logic paths. Boolean as well as non-Boolean variables can be represented. Further, timing information is encoded in the vector logic variable itself by defining the vector logic variable as invalid when all the vector components currently exhibit a logic low and by defining the vector logic variable as valid when a subset of the vector components exhibits a logic high. With a plurality of valid vector logic states, subsets defining valid vector logic states are nonoverlapping. The mousetrap logic gate comprises a plurality of gate components in parallel, corresponding with each output vector component. Each gate component has an arming mechanism, a ladder logic, and an inverting buffer mechanism. The ladder logic performs logic functions on one or more input vectors and provides the result to the inverting buffer mechanism. The arming mechanism periodically precharges the inverting buffer input to drive the gate component output to a logic low until the inverting buffer mechanism is triggered by the ladder logic.
摘要:
A floating point processing system which uses a multiplier unit and an adder unit to perform floating point division and square root operations using both a conventional and a modified form of the Newton-Raphson method. The modified form of the Newton-Raphson method is used in place of the final iteration of the conventional Newton-Raphson so as to compute high precision approximated results with a substantial improvement in speed. The invention computes approximated results faster and simplifies hardware requirements because no multiplications of numbers of the precision of the result are required.