Patent search ap:("Samsung Electronics Co. Page Ltd.") AND inv:"Ali SHAFIEE ARDESTANI"

1.

发明公开
EXTREME SPARSE DEEP LEARNING EDGE INFERENCE ACCELERATOR 审中-公开

公开(公告)号：US20240095519A1

公开(公告)日：2024-03-21

申请号：US17989675

申请日：2022-11-17

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ardavan PEDRAM , Ali SHAFIEE ARDESTANI , Jong Hoon SHIN , Joseph H. HASSOUN

IPC: G06N3/08 , H03M7/30

CPC classification number: G06N3/08 , H03M7/3066

Abstract: A neural network inference accelerator includes first and second neural processing units (NPUs) and a sparsity management unit. The first NPU receives activation and weight tensors based on an activation sparsity density and a weight sparsity density both being greater than a predetermined sparsity density. The second NPU receives activation and weight tensors based on at least one of the activation sparsity density and the weight sparsity density being less than or equal to the predetermined sparsity density. The sparsity management unit controls transfer of the activation tensor and the weight tensor based on the activation sparsity density and the weight sparsity density with respect to the predetermined sparsity density.

2.

发明申请
SRAM-SHARING FOR RECONFIGURABLE NEURAL PROCESSING UNITS 有权

公开(公告)号：US20220405557A1

公开(公告)日：2022-12-22

申请号：US17400094

申请日：2021-08-11

Applicant: Samsung Electronics Co., Ltd.

Inventor： Jong Hoon SHIN , Ali SHAFIEE ARDESTANI , Joseph H. HASSOUN

IPC: G06N3/063

Abstract: A system and a method is disclosed for processing input feature map (IFM) data of a current layer of a neural network model using an array of reconfigurable neural processing units (NPUs) and storing output feature map (OFM) data of the next layer of the neural network model at a location that does not involve a data transfer between memories of the NPUs according to the subject matter disclosed herein. The reconfigurable NPUs may be used to improve NPU utilization of NPUs of a neural processing system.

3.

发明公开
EFFICIENCY OF VISION TRANSFORMERS WITH ADAPTIVE TOKEN PRUNING 审中-公开

公开(公告)号：US20230368494A1

公开(公告)日：2023-11-16

申请号：US17978959

申请日：2022-11-01

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ling LI , Ali SHAFIEE ARDESTANI

IPC: G06V10/771 , G06V10/764 , G06V10/776 , G06V10/26 , G06V10/82

CPC classification number: G06V10/771 , G06V10/764 , G06V10/776 , G06V10/273 , G06V10/82

Abstract: A system and a method are disclosed for training a vision transformer. A token distillation loss of an input image based on a teacher network classification token and a token importance score of a student network (the vision transformer during training) are determined at a pruning layer of the vision transformer. When a current epoch number is odd, sparsification of tokens of the input image is skipped and the dense input image is processed by layers that are subsequent to the pruning layer. When the current epoch number is even, tokens of the input image are pruned at the pruning layer and processed by layers that are subsequent to the pruning layer. A label loss and a total loss for the input image are determined by the subsequent layers and the student network is updated.

4.

发明公开
SIGNED MULTIPLICATION USING UNSIGNED MULTIPLIER WITH DYNAMIC FINE-GRAINED OPERAND ISOLATION 审中-公开

公开(公告)号：US20230153065A1

公开(公告)日：2023-05-18

申请号：US18096559

申请日：2023-01-12

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ilia OVSIANNIKOV , Ali SHAFIEE ARDESTANI , Joseph H. HASSOUN , Lei WANG

IPC: G06F7/487 , G06F9/30 , G06F7/523

CPC classification number: G06F7/4876 , G06F9/3001 , G06F9/30021 , G06F7/523

Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

5.

发明申请
WEIGHT-SPARSE NEURAL PROCESSING UNIT WITH MULTI-DIMENSIONAL ROUTING OF NON-ZERO VALUES 有权

公开(公告)号：US20220156569A1

公开(公告)日：2022-05-19

申请号：US17521846

申请日：2021-11-08

Applicant: Samsung Electronics Co., Ltd.

Inventor： Jong Hoon SHIN , Ali SHAFIEE ARDESTANI , Joseph H. HASSOUN

IPC: G06N3/063 , G06F7/544

Abstract: A general matrix-matrix (GEMM) accelerator core includes first and second buffers, and a processing element (PE). The first buffer receives a elements of a matrix A of activation values. The second buffer receives b elements of a matrix B of weight values. The matrix B is preprocessed with a nonzero-valued b element replacing a zero-valued b element in a first row of the second buffer based on the zero-valued b element being in the first row of the second buffer. Metadata is generated that includes movement information of the nonzero-valued b element to replace the zero-valued b element. The PE receives b elements from a first row of the second buffer and a elements from the first buffer from locations in the first buffer that correspond to locations in the second buffer from where the b elements have been received by the PE as indicated by the metadata.

6.

发明公开
SYSTEM AND METHOD FOR HANDLING PROCESSING WITH SPARSE WEIGHTS AND OUTLIERS 审中-公开

公开(公告)号：US20240192922A1

公开(公告)日：2024-06-13

申请号：US18171300

申请日：2023-02-17

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ali SHAFIEE ARDESTANI , Hamzah Ahmed Ali ABDELAZIZ , Ardavan PEDRAM , Joseph H. HASSOUN

IPC: G06F7/544

CPC classification number: G06F7/5443 , G06N3/0464

Abstract: Systems and methods for handling processing with sparse weights and outliers. In some embodiments, the method includes reading a first activation from a first row of an array of activations; multiplying a first weight by the first activation to form a first product; directing, by a first demultiplexer, the first product to a first adder tree, of a plurality of adder trees; reading a second activation from a second row of the array of activations; and multiplying a second weight by the second activation.

7.

发明申请
LOW OVERHEAD IMPLEMENTATION OF WINOGRAD FOR CNN WITH 3x3, 1x3 AND 3x1 FILTERS ON WEIGHT STATION DOT-PRODUCT BASED CNN ACCELERATORS 有权

公开(公告)号：US20210294873A1

公开(公告)日：2021-09-23

申请号：US16898422

申请日：2020-06-10

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ali SHAFIEE ARDESTANI , Joseph HASSOUN

IPC: G06F17/14 , G06N3/04 , G06F17/15

Abstract: A system and a method are disclosed for forming an output feature map (OFM). Activation values in an input feature map (IFM) are selected and transformed on-the-fly into the Winograd domain. Elements in a Winograd filter is selected that respectively correspond to the transformed activation values. A transformed activation value is multiplied by a corresponding element of the Winograd filter to form a corresponding product value in the Winograd domain. Activation values are repeatedly selected, transformed and multiplied by a corresponding element in the Winograd filter to form corresponding product values in the Winograd domain until all activation values in the IFM have been transformed and multiplied by the corresponding element. The product values are summed in the Winograd domain to form elements of a feature map in the Winograd domain. The elements of the feature map in the Winograd domain are inverse-Winograd transformed on-the-fly to form the OFM.

8.

发明申请
PIECEWISE QUANTIZATION FOR NEURAL NETWORKS 有权

公开(公告)号：US20210133278A1

公开(公告)日：2021-05-06

申请号：US16816247

申请日：2020-03-11

Applicant: Samsung Electronics Co., Ltd.

Inventor： Jun FANG , Joseph H. HASSOUN , Ali SHAFIEE ARDESTANI , Hamzah Ahmed Ali ABDELAZIZ , Georgios GEORGIADIS , Hui CHEN , David Philip Lloyd THORSLEY

IPC: G06F17/18 , G06N3/08 , G06N3/04

Abstract: A method of quantizing an artificial neural network may include dividing a quantization range for a tensor of the artificial neural network into a first region and a second region, and quantizing values of the tensor in the first region separately from values of the tensor in the second region. Linear or nonlinear quantization may be applied to values of the tensor in the first region and the second region. The method may include locating a breakpoint between the first region and the second region by substantially minimizing an expected quantization error over at least a portion of the quantization range. The expected quantization error may be minimized by solving analytically and/or searching numerically.

9.

发明公开
EFFICIENT CIRCUIT FOR NEURAL NETWORK PROCESSING 审中-公开

公开(公告)号：US20230205488A1

公开(公告)日：2023-06-29

申请号：US17570326

申请日：2022-01-06

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ling LI , Ali SHAFIEE ARDESTANI , Hamzah ABDELAZIZ , Joseph H. HASSOUN

IPC: G06F7/487

CPC classification number: G06F7/4876

Abstract: A system and method for efficient processing for neural network inference operations. In some embodiments, the system includes: a circuit configured to multiply a first number by a second number, the first number being represented as: a sign bit five exponent bits, and seven mantissa bits, representing an eight-bit full mantissa.

10.

发明申请
DEPTHWISE-CONVOLUTION IMPLEMENTATION ON A NEURAL PROCESSING CORE 有权

公开(公告)号：US20220405558A1

公开(公告)日：2022-12-22

申请号：US17401298

申请日：2021-08-12

Applicant: Samsung Electronics Co., Ltd.

Inventor： Jong Hoon SHIN , Ali SHAFIEE ARDESTANI , Joseph H. HASSOUN

IPC: G06N3/063 , G06F15/80

Abstract: A core of neural processing units is configured to efficiently process a depthwise convolution by maximizing spatial feature-map locality using adder trees. Data paths of activations and weights are inverted, and 2-to-1 multiplexers are every 2/9 multipliers along a row of multipliers. During a depthwise convolution operation, the core is operated using a RS×HW dataflow to maximize the locality of feature maps. For a normal convolution operation, the data paths of activations and weights may be configured for a normal convolution configuration and in which multiplexers are idle.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification