Patent search ap:("Samsung Electronics Co. Page Ltd.") AND inv:"Joseph HASSOUN"

1.

发明公开
WEIGHT-SPARSE NPU WITH FINE-GRAINED STRUCTURED SPARSITY 审中-公开

公开(公告)号：US20240119270A1

公开(公告)日：2024-04-11

申请号：US17980544

申请日：2022-11-03

Applicant: Samsung Electronics Co., Ltd.

Inventor： Jong Hoon SHIN , Ardavan PEDRAM , Joseph HASSOUN

IPC: G06N3/063 , G06N3/08

CPC classification number: G06N3/063 , G06N3/08

Abstract: A neural processing unit is reconfigurable to process a fine-grain structured sparsity weight arrangement selected from N:M=1:4, 2:4, 2:8 and 4:8 fine-grain structured weight sparsity arrangements. A weight buffer stores weight values and a weight multiplexer array outputs one or more weight values stored in the weight buffer as first operand values based on a selected fine-grain structured sparsity weight arrangement. An activation buffer stores activation values and an activation multiplexer array outputs one or more activation values stored in the activation buffer as second operand values based on the selected fine-grain structured weight sparsity in which each respective second operand value and a corresponding first operand value forms an operand value pair. A multiplier array outputs a product value for each operand value pair.

2.

发明申请
LOW OVERHEAD IMPLEMENTATION OF WINOGRAD FOR CNN WITH 3x3, 1x3 AND 3x1 FILTERS ON WEIGHT STATION DOT-PRODUCT BASED CNN ACCELERATORS 有权

公开(公告)号：US20210294873A1

公开(公告)日：2021-09-23

申请号：US16898422

申请日：2020-06-10

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ali SHAFIEE ARDESTANI , Joseph HASSOUN

IPC: G06F17/14 , G06N3/04 , G06F17/15

Abstract: A system and a method are disclosed for forming an output feature map (OFM). Activation values in an input feature map (IFM) are selected and transformed on-the-fly into the Winograd domain. Elements in a Winograd filter is selected that respectively correspond to the transformed activation values. A transformed activation value is multiplied by a corresponding element of the Winograd filter to form a corresponding product value in the Winograd domain. Activation values are repeatedly selected, transformed and multiplied by a corresponding element in the Winograd filter to form corresponding product values in the Winograd domain until all activation values in the IFM have been transformed and multiplied by the corresponding element. The product values are summed in the Winograd domain to form elements of a feature map in the Winograd domain. The elements of the feature map in the Winograd domain are inverse-Winograd transformed on-the-fly to form the OFM.

3.

发明公开
RUNTIME RECONFIGURABLE COMPRESSION FORMAT CONVERSION WITH BIT-PLANE GRANULARITY 审中-公开

公开(公告)号：US20240162917A1

公开(公告)日：2024-05-16

申请号：US18096557

申请日：2023-01-12

Applicant: Samsung Electronics Co., Ltd.

Inventor： Jong Hoon SHIN , Ardavan PEDRAM , Joseph HASSOUN

IPC: H03M7/30 , H03M7/42

CPC classification number: H03M7/6088 , H03M7/42

Abstract: A runtime bit-plane data-format optimizer for a processing element includes a sparsity-detector and a compression-converter. The sparsity-detector selects a bit-plane compression-conversion format during a runtime of the processing element using a performance model that is based on a first sparsity pattern of first bit-plane data stored in a memory exterior to the processing element and a second sparsity pattern of second bit-plane data that is to be stored in a memory within the processing element. The second sparsity pattern is based on a runtime configuration of the processing element. The first bit-plane data is stored using a first bit-plane compression format and the bit-plane second data is to be stored using a second bit-plane compression format. The compression-conversion circuit converts the first bit-plane compression format of the first data to be the second bit-plane compression format of the second data.

4.

发明申请
SIGNED MULTIPLICATION USING UNSIGNED MULTIPLIER WITH DYNAMIC FINE-GRAINED OPERAND ISOLATION 有权

公开(公告)号：US20210141603A1

公开(公告)日：2021-05-13

申请号：US17151115

申请日：2021-01-15

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ilia OVSIANNIKOV , Ali SHAFIEE ARDESTANI , Joseph HASSOUN , Lei WANG

IPC: G06F7/487 , G06F9/30 , G06F7/523

Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

5.

发明公开
RUNTIME RECONFIGURABLE COMPRESSION FORMAT CONVERSION 审中-公开

公开(公告)号：US20240162916A1

公开(公告)日：2024-05-16

申请号：US18096551

申请日：2023-01-12

Applicant: Samsung Electronics Co., Ltd.

Inventor： Jong Hoon SHIN , Ardavan PEDRAM , Joseph HASSOUN

IPC: H03M7/30

CPC classification number: H03M7/3059 , H03M7/6011 , H03M7/6094

Abstract: A runtime data-format optimizer for a processing element includes a sparsity-detector and a compression-converter. The sparsity-detector selects a first compression-conversion format during a runtime of the processing element based on a performance model that is based on a first sparsity pattern of first data stored in a first memory that is exterior to the processing element and a second sparsity pattern of second data that is to be stored in a second memory within the processing element. The second sparsity pattern is based on a runtime configuration of the processing element. The first data is stored in the first memory using a first compression format and the second data is to be stored in the second memory using a second compression format. The compression-conversion circuit converts the first compression format of the first data to be the second compression format of the second data based on the first compression-conversion format.

6.

发明公开
DNNS ACCELERATION WITH BLOCK-WISE N:M STRUCTURED WEIGHT SPARSITY 审中-公开

公开(公告)号：US20240160483A1

公开(公告)日：2024-05-16

申请号：US18097200

申请日：2023-01-13

Applicant: Samsung Electronics Co., Ltd.

Inventor： Hamzah ABDELAZIZ , Joseph HASSOUN

IPC: G06F9/50 , G06F9/54

CPC classification number: G06F9/5027 , G06F9/544

Abstract: An accelerator core includes first and second buffers and at least one group of k processing elements. The first buffer receives at least one group of block-wise sparsified first elements. A block size (k,c) of each group of block-wise sparsified first elements includes k rows and c columns in which k is greater than or equal to 2, k times p equals K, and c times q equals C in which K is an output channel dimension of a tensor of first elements, C is a number of input channels of the tensor of first elements, p is an integer and q is an integer. The second buffer receive second elements. Each respective group of processing elements receive k rows of first elements from a block of first elements corresponding to the group of PEs, and receives second elements that correspond to first elements received from the first buffer.

7.

发明申请
LEARNED THRESHOLD TOKEN PRUNING FOR TRANSFORMER NEURAL NETWORKS 有权

公开(公告)号：US20220374766A1

公开(公告)日：2022-11-24

申请号：US17578435

申请日：2022-01-18

Applicant: Samsung Electronics Co., Ltd. , The Regents of The University of California

Inventor： David Philip Lloyd THORSLEY , Sheng SHEN , Se Hoon KIM , Amir GHOLAMINEJAD , Woosuk KWON , Joseph HASSOUN , Kurt KEUTZER

IPC: G06N20/00

Abstract: An architecture and method are disclosed to reduce computation in a self-attention model. The self-attention model is trained using multiple sub-models; each sub-model receiving an input sequence of tokens; each input sequence of tokens being scored within each sub-model to provide a token score for each sub-model; each sub-model having a predetermined threshold score. Each sub-model prunes tokens from the input sequence with a score below the predetermined threshold score for the sub-model. The pruned sequences of each sub-model are used as the input sequences for the next sub-model. The predetermined threshold scores for each sub-model differing.

8.

发明公开
HYBRID-SPARSE NPU WITH FINE-GRAINED STRUCTURED SPARSITY 审中-公开

公开(公告)号：US20240095505A1

公开(公告)日：2024-03-21

申请号：US17980541

申请日：2022-11-03

Applicant: Samsung Electronics Co., Ltd.

Inventor： Jong Hoon SHIN , Ardavan PEDRAM , Joseph HASSOUN

IPC: G06N3/063 , G06N3/08

CPC classification number: G06N3/063 , G06N3/08

Abstract: A neural processing unit is disclosed that supports dual-sparsity modes. A weight buffer is configured to store weight values in an arrangement selected from a structured weight sparsity arrangement or a random weight sparsity arrangement. A weight multiplexer array is configured to output one or more weight values stored in the weight buffer as first operand values based on the selected weight sparsity arrangement. An activation buffer is configured to store activation values. An activation multiplexer array includes inputs to the activation multiplexer array that are coupled to the activation buffer, and is configured to output one or more activation values stored in the activation buffer as second operand values in which each respective second operand value and a corresponding first operand value forming an operand value pair. A multiplier array is configured to output a product value for each operand value pair.

9.

发明申请
MIXED-PRECISION NEURAL PROCESSING UNIT (NPU) USING SPATIAL FUSION WITH LOAD BALANCING 有权

公开(公告)号：US20210312325A1

公开(公告)日：2021-10-07

申请号：US16898433

申请日：2020-06-10

Applicant: Samsung Electronics Co., Ltd.

Inventor： Hamzah ABDELAZIZ , Joseph HASSOUN , Ali SHAFIEE ARDESTANI

IPC: G06N20/00 , H04L29/08

Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique. The machine learning system may include a computation circuit configured to compute a partial computation result based, at least in part, upon the selected data subdivision and the weight subdivision.

10.

发明申请
SIGNED MULTIPLICATION USING UNSIGNED MULTIPLIER WITH DYNAMIC FINE-GRAINED OPERAND ISOLATION 审中-公开

公开(公告)号：US20200150924A1

公开(公告)日：2020-05-14

申请号：US16276582

申请日：2019-02-14

Applicant: Samsung Electronics Co., Ltd.

Inventor： Ilia OVSIANNIKOV , Ali SHAFIEE ARDESTANI , Joseph HASSOUN , Lei WANG

IPC: G06F7/487 , G06F9/30

Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification