-
公开(公告)号:US20250005452A1
公开(公告)日:2025-01-02
申请号:US18708948
申请日:2023-01-24
Applicant: QUALCOMM Incorporated
Inventor: Markus NAGEL , Marios FOURNARAKIS , Tijmen Pieter Frederik BLANKEVOORT , Yelysei BONDARENKO
IPC: G06N20/00
Abstract: Certain aspects of the present disclosure provide techniques and apparatus for mitigating weight oscillation during quantization-aware training. In one example, a method includes identifying oscillation of a parameter of a machine learning model during quantization-aware training of the machine learning model, and applying an oscillation mitigation procedure during the quantization-aware training of the machine learning model in response to identifying the oscillation, the oscillation mitigation procedure comprising at least one of oscillation dampening or parameter freezing.
-
公开(公告)号:US20230419087A1
公开(公告)日:2023-12-28
申请号:US18330990
申请日:2023-06-07
Applicant: QUALCOMM Incorporated
Inventor: Minseop PARK , Jaeseong YOU , Simyung CHANG , Markus NAGEL , Chirag Sureshbhai PATEL
IPC: G06N3/0495 , G06N3/044 , G06N3/08
CPC classification number: G06N3/0495 , G06N3/044 , G06N3/08
Abstract: A processor-implemented method for adaptive quantization in an artificial neural network (ANN) includes receiving an ANN model. The ANN model has multiple channels of target activations. A quantization module is incorporated between a first linear layer of the ANN and a second linear layer of the ANN to generate an adapted ANN. The quantization module scales a first set of weights and biases of the first linear layer based on a learnable quantization module parameter and scales a second set of weights of the second linear layer based on an inverse of the learnable quantization module parameter.
-
公开(公告)号:US20240144017A1
公开(公告)日:2024-05-02
申请号:US18548557
申请日:2022-04-18
Applicant: QUALCOMM Incorporated
Inventor: Marios FOURNARAKIS , Markus NAGEL
IPC: G06N3/084 , G06N3/0495
CPC classification number: G06N3/084 , G06N3/0495
Abstract: Certain aspects of the present disclosure provide techniques for efficient quantized learning. A tensor is received at a layer of a neural network, and a current tensor is generated at a first bitwidth based on the received tensor. One or more quantization parameter values are determined based on the current tensor. The current tensor is quantized to a lower bitwidth based on one or more quantization parameter values determined based on a previous tensor generated during the training of a neural network.
-
公开(公告)号:US20230058159A1
公开(公告)日:2023-02-23
申请号:US17759725
申请日:2021-04-29
Applicant: QUALCOMM Incorporated
Inventor: Marinus Willem VAN BAALEN , Christos LOUIZOS , Markus NAGEL , Tijmen Pieter Frederik BLANKEVOORT , Rana Ali AMJAD
IPC: G06N3/08
Abstract: Various embodiments include methods and devices for joint mixed-precision quantization and structured pruning. Embodiments may include determining whether a plurality of gates of quantization and pruning gates are selected for combination, and in response to determining that the plurality of gates are selected for combination, iteratively for each successive gate of the plurality of gates selected for combination quantizing a residual error of a quantized tensor to a scale of a next bit-width producing a residual error quantized tensor in which the next bit-width increases for each successive iteration, and adding the quantized tensor and the residual error quantized tensor producing a next quantized tensor in which the next quantized tensor has the next bit-width, and in which the next quantized tensor is the quantized tensor for a successive iteration.
-
公开(公告)号:US20240386239A1
公开(公告)日:2024-11-21
申请号:US18482196
申请日:2023-10-06
Applicant: QUALCOMM Incorporated
Inventor: Yelysei BONDARENKO , Markus NAGEL , Tijmen Pieter Frederik BLANKEVOORT
IPC: G06N3/04
Abstract: Certain aspects of the present disclosure provide techniques and apparatus for processing data using a transformer neural network. The method generally includes receiving an input for processing using a transformer neural network. An attention output is generated in the transformer neural network. Generally, the attention output may be generated such that outlier values for the attention output are attenuated in the transformer neural network. An output of the transformer neural network is generated based on the generated attention output.
-
公开(公告)号:US20220385907A1
公开(公告)日:2022-12-01
申请号:US17645018
申请日:2021-12-17
Applicant: QUALCOMM Incorporated
Inventor: Yunfan ZHANG , Ties Jehan VAN ROZENDAAL , Taco Sebastiaan COHEN , Markus NAGEL , Johann Hinrich BREHMER
IPC: H04N19/126 , G06T9/00 , G06N3/08 , G06N3/04 , H04N19/147 , H04N19/91
Abstract: Techniques are described for compressing and decompressing data using machine learning systems. An example process can include receiving a plurality of images for compression by a neural network compression system. The process can include determining, based on a first image from the plurality of images, a first plurality of weight values associated with a first model of the neural network compression system. The process can include generating a first bitstream comprising a compressed version of the first plurality of weight values. The process can include outputting the first bitstream for transmission to a receiver.
-
公开(公告)号:US20200293864A1
公开(公告)日:2020-09-17
申请号:US16299375
申请日:2019-03-12
Applicant: QUALCOMM Incorporated
Inventor: Markus NAGEL , Tijmen Pieter Frederik BLANKEVOORT
Abstract: Certain aspects of the present disclosure are directed to methods and apparatus for operating an artificial neural network using data-aware layer decomposition. One exemplary method generally includes receiving a first input signal at a first layer of the artificial neural network; generating a first output signal of the first layer based, at least in part, on a weight matrix of the first layer and the first input signal; decomposing the weight matrix; generating an approximate output signal of the first layer based, at least in part, on the decomposed weight matrix and the first input signal; generating an updated decomposed weight matrix by minimizing a difference between the generated first output signal of the first layer and the approximate output signal of the first layer; and operating the first layer of the artificial neural network using the updated decomposed weight matrix.
-
公开(公告)号:US20240169708A1
公开(公告)日:2024-05-23
申请号:US18338184
申请日:2023-06-20
Applicant: QUALCOMM Incorporated
Inventor: Davide ABATI , Amirhossein HABIBIAN , Markus NAGEL
IPC: G06V10/776 , G06V10/77 , G06V20/40 , G06V10/82
CPC classification number: G06V10/776 , G06V10/7715 , G06V20/46 , G06V10/82
Abstract: Certain aspects of the present disclosure provide techniques and apparatus for delta quantization for video processing and other data streams with temporal content. An example method generally includes receiving image data including at least a first frame and a second frame, generating a first convolutional output based on a first frame using a machine learning model, generating a second convolutional output based on a difference between the first frame and the second frame using one or more quantizers of the machine learning model, generating a third convolutional output associated with the second frame as a combination of the first convolutional output and the second convolutional output, and performing image processing based on the first convolutional output and the third convolutional output.
-
公开(公告)号:US20230306233A1
公开(公告)日:2023-09-28
申请号:US18103428
申请日:2023-01-30
Applicant: QUALCOMM Incorporated
Inventor: Marinus Willem VAN BAALEN , Brian KAHNE , Eric Wayne MAHURIN , Tijmen Pieter Frederik BLANKEVOORT , Andrey KUZMIN , Andrii SKLIAR , Markus NAGEL
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: A processor-implemented method includes bit shifting a binary representation of a neural network parameter. The neural network parameter has fewer bits, b, than a number of hardware bits, B, supported by hardware that processes the neural network parameter. The bit shifting effectively multiplies the neural network parameter by 2B-b. The method also includes dividing a quantization scale by 2B-b to obtain an updated quantization scale. The method further includes quantizing the bit shifted binary representation with the updated quantization scale to obtain a value for the neural network parameter.
-
公开(公告)号:US20230076290A1
公开(公告)日:2023-03-09
申请号:US17792975
申请日:2021-02-04
Applicant: QUALCOMM Incorporated
Inventor: Rana Ali AMJAD , Markus NAGEL , Tijmen Pieter Frederik BLANKEVOORT , Marinus Willem VAN BAALEN , Christos LOUIZOS
IPC: G06N3/04
Abstract: A method for quantizing a pre-trained neural network includes computing a loss on a training set of candidate weights of the neural network. A rounding parameter is assigned to each candidate weight. The rounding parameter is a binary random value or a multinomial value. A quantized weight value is computed based on the loss and the rounding parameter.
-
-
-
-
-
-
-
-
-