-
公开(公告)号:US20220374766A1
公开(公告)日:2022-11-24
申请号:US17578435
申请日:2022-01-18
Inventor: David Philip Lloyd THORSLEY , Sheng SHEN , Se Hoon KIM , Amir GHOLAMINEJAD , Woosuk KWON , Joseph HASSOUN , Kurt KEUTZER
IPC: G06N20/00
Abstract: An architecture and method are disclosed to reduce computation in a self-attention model. The self-attention model is trained using multiple sub-models; each sub-model receiving an input sequence of tokens; each input sequence of tokens being scored within each sub-model to provide a token score for each sub-model; each sub-model having a predetermined threshold score. Each sub-model prunes tokens from the input sequence with a score below the predetermined threshold score for the sub-model. The pruned sequences of each sub-model are used as the input sequences for the next sub-model. The predetermined threshold scores for each sub-model differing.
-
公开(公告)号:US20210133278A1
公开(公告)日:2021-05-06
申请号:US16816247
申请日:2020-03-11
Applicant: Samsung Electronics Co., Ltd.
Inventor: Jun FANG , Joseph H. HASSOUN , Ali SHAFIEE ARDESTANI , Hamzah Ahmed Ali ABDELAZIZ , Georgios GEORGIADIS , Hui CHEN , David Philip Lloyd THORSLEY
Abstract: A method of quantizing an artificial neural network may include dividing a quantization range for a tensor of the artificial neural network into a first region and a second region, and quantizing values of the tensor in the first region separately from values of the tensor in the second region. Linear or nonlinear quantization may be applied to values of the tensor in the first region and the second region. The method may include locating a breakpoint between the first region and the second region by substantially minimizing an expected quantization error over at least a portion of the quantization range. The expected quantization error may be minimized by solving analytically and/or searching numerically.
-
公开(公告)号:US20230038891A1
公开(公告)日:2023-02-09
申请号:US17469853
申请日:2021-09-08
Applicant: Samsung Electronics Co., Ltd.
Inventor: Jun FANG , David Philip Lloyd THORSLEY , Chengyao SHEN , Joseph H. HASSOUN
Abstract: A method is disclosed for reducing computation of a differentiable architecture search. An output node is formed having a channel dimension that is one-fourth of a channel dimension of a normal cell of a neural network architecture by averaging channel outputs of intermediate nodes of the normal cell. The output node is preprocessed using a 1×1 convolution to form channels of input nodes for a next layer of the cells in the neural network architecture. Forming the output node includes forming s groups of channel outputs of the intermediate nodes by dividing the channel outputs of the intermediate nodes by a splitting parameter s. An average channel output for each group of channel outputs is formed, and the output node is formed by concatenating the average channel output for each group of channels with channel outputs of the intermediate nodes of the normal cell.
-
公开(公告)号:US20230028226A1
公开(公告)日:2023-01-26
申请号:US17475330
申请日:2021-09-14
Applicant: Samsung Electronics Co., Ltd.
Inventor: David Philip Lloyd THORSLEY , Joseph H. HASSOUN , Jun FANG , Chengyao SHEN
Abstract: A method is disclosed to reduce computation in a self-attention deep-learning model. A feature-map regularization term is added to a loss function while training the self-attention model. At least one low-magnitude feature is removed from at least one feature map of the self-attention model during inference. Weights of the self-attention model are quantized after the self-attention model has been trained. Adding the feature-map regularization term reduces activation values of feature maps, and removing the at least one low-magnitude feature from at least one feature map may be performed by setting the low-magnitude feature to be equal to zero based on the low-magnitude feature having a value that is less than a predetermined threshold. Feature maps of the self-attention model quantized and compressed.
-
-
-