QUANTIZED NEURAL NETWORK ARCHITECTURE
    2.
    发明公开

    公开(公告)号:US20240104356A1

    公开(公告)日:2024-03-28

    申请号:US17934476

    申请日:2022-09-22

    CPC classification number: G06N3/0481

    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for quantized machine learning. A quantized input matrix is accessed at a layer of a neural network, and a first interim value is generated in an accumulator by performing matrix multiplication, using the accumulator, of the quantized input matrix and a quantized weight matrix associated with the layer of the neural network. The first interim value is normalized based at least in part on one or more leading sign bits of the first interim value, and the normalized first interim value is dequantized. A second interim value is generated by applying a rounded right-shift operation to the dequantized normalized first interim value, and activation data is generated by applying an activation function to the second interim value.

    INSTRUCTION APPLICABLE TO RADIX-3 BUTTERFLY COMPUTATION

    公开(公告)号:US20230102798A1

    公开(公告)日:2023-03-30

    申请号:US17448828

    申请日:2021-09-24

    Abstract: A device includes a processor and a memory configured to store instructions. The processor is configured to receive a particular instruction from among the instructions and to execute the particular instruction to generate first output data corresponding to a sum of first input data and second input data. The processor is also configured to execute the particular instruction to perform a divide operation on the second input data and to generate second output data corresponding to a difference of the first input data and a result of the divide operation.

    Instruction Set Architecture for Neural Network Quantization and Packing

    公开(公告)号:US20230350678A1

    公开(公告)日:2023-11-02

    申请号:US17732361

    申请日:2022-04-28

    CPC classification number: G06F9/30101 G06N3/04

    Abstract: This application is directed to using a single instruction to initiate a sequence of computational operations related to a neural network. An electronic device receives a single instruction to apply a neural network operation to a set of M-bit elements stored in one or more input vector registers. In response to the single instruction, the electronic device implements the neural network operation on the set of M-bit elements to generate a set of P-bit elements by obtaining the set of M-bit elements from the one or more input vector registers, quantizing each of the set of M-bit elements from M bits to P bits, and packing the set of P-bit elements into an output vector register. P is smaller than M. In some embodiments, the neural network operation is a quantization operation including at least a multiplication with a quantization factor and an addition with a zero point.

Patent Agency Ranking