SELF-TUNING MODEL COMPRESSION METHODOLOGY FOR RECONFIGURING DEEP NEURAL NETWORK AND ELECTRONIC DEVICE

    公开(公告)号:US20240078432A1

    公开(公告)日:2024-03-07

    申请号:US18508248

    申请日:2023-11-14

    Applicant: Kneron Inc.

    CPC classification number: G06N3/082 G06N3/04

    Abstract: A self-tuning model compression methodology for reconfiguring a Deep Neural Network (DNN) includes: receiving a pre-trained DNN model and a data set; performing an inter-layer sparsity analysis to generate a first sparsity result; and performing an intra-layer sparsity analysis to generate a second sparsity result, including: defining a plurality of sparsity metrics for the network; performing forward and backward passes to collect data corresponding to the sparsity metrics; using the collected data to calculate values for the defined sparsity metrics; and visualizing the calculated values using at least a histogram. The methodology further includes: according to the first and second sparsity results, performing low-rank approximation on the pre-trained DNN; pruning the represented DNN model according to the first and second sparsity results; performing quantization on the pruned DNN model according to the first and second sparsity results; and executing the reconfigured model on a user terminal for an end-user application.

    Buffer device and convolution operation device and method

    公开(公告)号:US10162799B2

    公开(公告)日:2018-12-25

    申请号:US15459675

    申请日:2017-03-15

    Applicant: Kneron, Inc.

    Abstract: A buffer device includes input lines, an input buffer unit and a remapping unit. The input lines are coupled to a memory and configured to be inputted with data from the memory in a current clock. The input buffer unit is coupled to the input lines and configured to buffer one part of the inputted data and output the part of the inputted data in a later clock. The remapping unit is coupled to the input lines and the input buffer unit, and configured to generate remap data for a convolution operation according to the data on the input lines and the output of the input buffer unit in the current clock. A convolution operation method for a data stream is also disclosed.

    POOLING OPERATION DEVICE AND METHOD FOR CONVOLUTIONAL NEURAL NETWORK

    公开(公告)号:US20180232629A1

    公开(公告)日:2018-08-16

    申请号:US15802092

    申请日:2017-11-02

    Applicant: Kneron, Inc.

    CPC classification number: G06N3/04 G06N3/0454 G06N3/063

    Abstract: A pooling operation method for a convolutional neural network includes the following steps of: reading multiple new data in at least one column of a pooling window; performing a first pooling operation with the new data to generate at least a pooling result column; storing the pooling result column in a buffer; and performing a second pooling operation with the pooling result column and at least a preceding pooling result column stored in the buffer to generate a pooling result of the pooling window. The first pooling operation and the second pooling operation are max pooling operations.

    Three-dimensional Integrated Circuit
    5.
    发明公开

    公开(公告)号:US20230223402A1

    公开(公告)日:2023-07-13

    申请号:US17573648

    申请日:2022-01-12

    Applicant: Kneron Inc.

    CPC classification number: H01L27/0688 H01L27/0248 H01L27/0207

    Abstract: A 3D integrated circuit includes a substrate, a first layer on top of the substrate, and a second layer on top of the first layer. The first layer includes a first chip, and a first network bridge formed at a first side of the first chip. The second layer includes a second chip, and a second network bridge formed at a first side of the second chip. The first chip and the first network bridge are coupled to the substrate through bumps. The second chip is coupled to the first chip and the first network bridge through bumps. The second network bridge is coupled to the first network bridge through bumps. The first network bridge and the second network bridge each include a network switch for controlling data transfer and/or power distribution.

    Pooling operation device and method for convolutional neural network

    公开(公告)号:US10943166B2

    公开(公告)日:2021-03-09

    申请号:US15802092

    申请日:2017-11-02

    Applicant: Kneron, Inc.

    Abstract: A pooling operation method for a convolutional neural network includes the following steps of: reading multiple new data in at least one current column of a pooling window; performing a first pooling operation with the new data to generate at least a current column pooling result; storing the current column pooling result in a buffer; and performing a second pooling operation with the current column pooling result and at least a preceding column pooling result stored in the buffer to generate a pooling result of the pooling window. The first pooling operation and the second pooling operation are forward max pooling operations.

    METHOD OF COMPRESSING CONVOLUTION PARAMETERS, CONVOLUTION OPERATION CHIP AND SYSTEM

    公开(公告)号:US20190253071A1

    公开(公告)日:2019-08-15

    申请号:US15893294

    申请日:2018-02-09

    Applicant: Kneron, Inc.

    Abstract: A method for compressing multiple original convolution parameters into a convolution operation chip includes steps of: determining a range of the original convolution parameters; setting an effective bit number for the range; setting a representative value, wherein the representative value is within the range; calculating differential values between the original convolution parameters and the representative value; quantifying the differential values to a minimum effective bit to obtain a plurality of compressed convolution parameters; and transmitting the effective bit number, the representative value and the compressed convolution parameters to the convolution operation chip.

    Convolution operation device and convolution operation method

    公开(公告)号:US10936937B2

    公开(公告)日:2021-03-02

    申请号:US15801623

    申请日:2017-11-02

    Applicant: Kneron, Inc.

    Abstract: A convolution operation device includes a convolution calculation module, a memory and a buffer device. The convolution calculation module has a plurality of convolution units, and each convolution unit performs a convolution operation according to a filter and a plurality of current data, and leaves a part of the current data after the convolution operation. The buffer device is coupled to the memory and the convolution calculation module for retrieving a plurality of new data from the memory and inputting the new data to each of the convolution units. The new data are not a duplicate of the current data. A convolution operation method is also disclosed.

    Method of compressing convolution parameters, convolution operation chip and system

    公开(公告)号:US10516415B2

    公开(公告)日:2019-12-24

    申请号:US15893294

    申请日:2018-02-09

    Applicant: Kneron, Inc.

    Abstract: A method for compressing multiple original convolution parameters into a convolution operation chip includes steps of: determining a range of the original convolution parameters; setting an effective bit number for the range; setting a representative value, wherein the representative value is within the range; calculating differential values between the original convolution parameters and the representative value; quantifying the differential values to a minimum effective bit to obtain a plurality of compressed convolution parameters; and transmitting the effective bit number, the representative value and the compressed convolution parameters to the convolution operation chip.

Patent Agency Ranking