LOSS-ERROR-AWARE QUANTIZATION OF A LOW-BIT NEURAL NETWORK

    公开(公告)号:US20250117639A1

    公开(公告)日:2025-04-10

    申请号:US18886625

    申请日:2024-09-16

    Abstract: Methods, apparatus, systems and articles of manufacture for loss-error-aware quantization of a low-bit neural network are disclosed. An example apparatus includes a network weight partitioner to partition unquantized network weights of a first network model into a first group to be quantized and a second group to be retrained. The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss. The example apparatus includes a weight updater to update the second group of network weights based on the difference. The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights.

    DYNAMIC NEURAL NETWORK SURGERY
    65.
    发明申请

    公开(公告)号:US20250045582A1

    公开(公告)日:2025-02-06

    申请号:US18804720

    申请日:2024-08-14

    Abstract: Techniques related to compressing a pre-trained dense deep neural network to a sparsely connected deep neural network for efficient implementation are discussed. Such techniques may include iteratively pruning and splicing available connections between adjacent layers of the deep neural network and updating weights corresponding to both currently disconnected and currently connected connections between the adjacent layers.

    Efficient neural networks with elaborate matrix structures in machine learning environments

    公开(公告)号:US12165065B2

    公开(公告)日:2024-12-10

    申请号:US16632145

    申请日:2017-08-18

    Abstract: A mechanism is described for facilitating slimming of neural networks in machine learning environments. A method includes learning a first neural network associated with machine learning processes to be performed by a processor of a computing device, where learning includes analyzing a plurality of channels associated with one or more layers of the first neural network. The method may further include computing a plurality of scaling factors to be associated with the plurality of channels such that each channel is assigned a scaling factor, wherein each scaling factor to indicate relevance of a corresponding channel within the first neural network. The method may further include pruning the first neural network into a second neural network by removing one or more channels of the plurality of channels having low relevance as indicated by one or more scaling factors of the plurality of scaling factors assigned to the one or more channels.

    SAMPLE-ADAPTIVE CROSS-LAYER NORM CALIBRATION AND RELAY NEURAL NETWORK

    公开(公告)号:US20240296668A1

    公开(公告)日:2024-09-05

    申请号:US18572510

    申请日:2021-09-10

    CPC classification number: G06V10/82 G06V10/955

    Abstract: Technology to conduct image sequence/video analysis can include a processor, and a memory coupled to the processor, the memory storing a neural network, the neural network comprising a plurality of convolution layers, and a plurality of normalization layers arranged as a relay structure, wherein each normalization layer is coupled to and following a respective one of the plurality of convolution layers. The plurality of normalization layers can be arranged as a relay structure where a normalization layer for a layer (k) is coupled to and following a normalization layer for a preceding layer (k−1). The normalization layer for the layer (k) is coupled to the normalization layer for the preceding layer (k−1) via a hidden state signal and a cell state signal, each signal generated by the normalization layer for the preceding layer (k−1). Each normalization layer (k) can include a meta-gating unit (MGU) structure.

    Composite binary decomposition network

    公开(公告)号:US11934949B2

    公开(公告)日:2024-03-19

    申请号:US16973608

    申请日:2018-09-27

    CPC classification number: G06N3/08 G06N3/044 G06N3/045 G06N3/063 G06N3/084

    Abstract: Embodiments are directed to a composite binary decomposition network. An embodiment of a computer-readable storage medium includes executable computer program instructions for transforming a pre-trained first neural network into a binary neural network by processing layers of the first neural network in a composite binary decomposition process, where the first neural network having floating point values representing weights of various layers of the first neural network. The composite binary decomposition process includes a composite operation to expand real matrices or tensors into a plurality of binary matrices or tensors, and a decompose operation to decompose one or more binary matrices or tensors of the plurality of binary matrices or tensors into multiple lower rank binary matrices or tensors.

Patent Agency Ranking