Patent search ap:("Samsung Electronics Co. Page Ltd.") AND inv:"Krishna MALLADI"

1.

发明申请
DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING 有权

公开(公告)号：US20210374210A1

公开(公告)日：2021-12-02

申请号：US17374988

申请日：2021-07-13

Applicant: Samsung Electronics Co., Ltd.

Inventor： Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU

IPC: G06F17/16 , G06F12/0877 , G06F12/0802 , G06N3/063 , G06N3/00 , G06N3/04

Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.

2.

发明申请
SMART IN-MODULE REFRESH FOR DRAM 有权
Title translation: 用于DRAM的SMART IN-MODULE刷新

公开(公告)号：US20160307619A1

公开(公告)日：2016-10-20

申请号：US14850938

申请日：2015-09-10

Applicant: Samsung Electronics Co., Ltd.

Inventor： Mu-Tien CHANG , Krishna MALLADI , Dimin NIU , Hongzhong ZHENG

IPC: G11C11/406 , G11C11/4076

CPC classification number: G11C11/40615 , G11C5/04 , G11C5/14 , G11C11/40618 , G11C11/4076

Abstract: A dynamic Random Access Memory (DRAM) module (105) is disclosed. The DRAM module (105) can includes a plurality of banks (205-1, 205-2, 205-3, 205-4) to store data and a refresh engine (115) that can be used to refresh one of the plurality of banks (205-1, 205-2, 205-3, 205-4). The DRAM module (105) can also include a Smart Refresh Component (305) that can advise the refresh engine (115) which bank to refresh using an out-of-order per-bank refresh. The Smart Refresh Component (305) can use a logic (415) to identify a farthest bank in the pending transactions in the transaction queue (430) at the time of refresh.

Abstract translation: 公开了一种动态随机存取存储器（DRAM）模块（105）。 DRAM模块（105）可以包括用于存储数据的多个存储体（205-1,205-2,205-3,205-4）和可用于刷新多个存储数据中的一个的刷新引擎（115）银行（205-1,205-2,205-3,205-4）。 DRAM模块（105）还可以包括智能刷新组件（305），该智能刷新组件可以通过使用每次刷新无序刷新哪个存储体来刷新刷新引擎（115）。在刷新时，智能刷新组件（305）可以使用逻辑（415）来识别事务队列（430）中的待处理事务中的最远存储体。

3.

发明申请
HBM RAS CACHE ARCHITECTURE 有权

公开(公告)号：US20250077370A1

公开(公告)日：2025-03-06

申请号：US18953042

申请日：2024-11-19

Applicant: Samsung Electronics Co., Ltd.

Inventor： Dimin NIU , Krishna MALLADI , Hongzhong ZHENG

IPC: G06F11/20 , G06F3/06

Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.

4.

发明申请
HBM SILICON PHOTONIC TSV ARCHITECTURE FOR LOOKUP COMPUTING AI ACCELERATOR 审中-公开

公开(公告)号：US20190214365A1

公开(公告)日：2019-07-11

申请号：US15911063

申请日：2018-03-02

Applicant: Samsung Electronics Co., Ltd.

Inventor： Peng GU , Krishna MALLADI , Hongzhong ZHENG

IPC: H01L25/065 , H01L31/12 , H01L31/02 , H01L31/0232 , H01L25/18 , G02F1/01 , H04B10/80 , H04Q11/00

Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.

5.

发明申请
SYSTEM AND METHOD FOR PROVIDING EXPANDABLE AND CONTRACTIBLE MEMORY OVERPROVISIONING 审中-公开

公开(公告)号：US20170351453A1

公开(公告)日：2017-12-07

申请号：US15230322

申请日：2016-08-05

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Krishna MALLADI , Hongzhong ZHENG

IPC: G06F3/06

CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/0608 , G06F3/0644 , G06F3/0661 , G06F3/0665 , G06F3/0683 , G06F11/00 , G06F12/023

Abstract: A memory module includes one or more memory devices, a memory interface to a host computer, and a memory overprovisioning logic. The memory overprovisioning logic is configured to monitor memory usage of the one or more memory devices and provide a compression and/or deduplication ratio of the memory module to a kernel driver module of the host computer. The kernel driver module of the host computer is configured to update a virtual memory capacity of the memory module based on the compression and/or deduplication ratio.

6.

发明申请
DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING 审中-公开

公开(公告)号：US20200183837A1

公开(公告)日：2020-06-11

申请号：US16388863

申请日：2019-04-18

Applicant: Samsung Electronics Co., Ltd.

Inventor： Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU

IPC: G06F12/0802 , G06F17/16

Abstract: A tensor computation dataflow accelerator semiconductor circuit is disclosed. The data flow accelerator includes a DRAM bank and a peripheral array of multiply-and-add units disposed adjacent to the DRAM bank. The peripheral array of multiply-and-add units are configured to form a pipelined dataflow chain in which partial output data from one multiply-and-add unit from among the array of multiply-and-add units is fed into another multiply-and-add unit from among the array of multiply-and-add units for data accumulation. Near-DRAM-processing dataflow (NDP-DF) accelerator unit dies may be stacked atop a base die. The base die may be disposed on a passive silicon interposer adjacent to a processor or a controller. The NDP-DF accelerator units may process partial matrix output data in parallel. The partial matrix output data may be propagated in a forward or backward direction. The tensor computation dataflow accelerator may perform a partial matrix transposition.

7.

发明申请
SOFTWARE STACK AND PROGRAMMING FOR DPU OPERATIONS 审中-公开

公开(公告)号：US20180121130A1

公开(公告)日：2018-05-03

申请号：US15426015

申请日：2017-02-06

Applicant: Samsung Electronics Co., Ltd.

Inventor： Shaungchen LI , Dimin NIU , Krishna MALLADI , Hongzhong ZHENG

IPC: G06F3/06 , G11C11/4094

CPC classification number: G06F3/0647 , G06F3/061 , G06F3/0683 , G11C7/1006 , G11C11/405 , G11C11/4091 , G11C11/4094 , G11C11/4097

Abstract: A system includes a library, a compiler, a driver and at least one dynamic random access memory (DRAM) processing unit (DPU). The library may determine at least one DPU operation corresponding to a received command. The compiler may form at least one DPU instruction for the DPU operation. The driver may send the at least one DPU instruction to at least one DPU. The DPU may include at least one computing cell array that includes a plurality of DRAM-based computing cells arranged in an array having at least one column in which the at least one column may include at least three rows of DRAM-based computing cells configured to provide a logic function that operates on a first row and a second row of the at least three rows and configured to store a result of the logic function in a third row of the at least three rows.

8.

发明申请
SYSTEM AND METHOD FOR CONTROLLING A PROGRAMMABLE DEDUPLICATION RATIO FOR A MEMORY SYSTEM 审中-公开

公开(公告)号：US20180039443A1

公开(公告)日：2018-02-08

申请号：US15285437

申请日：2016-10-04

Applicant: SAMSUNG ELECTRONICS CO., LTD

Inventor： Hongzhong ZHENG , Krishna MALLADI , Dimin NIU

IPC: G06F3/06 , G06F13/42 , G06F12/1009

CPC classification number: G06F3/0641 , G06F3/0608 , G06F3/0619 , G06F3/065 , G06F3/0683 , G06F12/1009 , G06F13/4282 , G06F2212/1044

Abstract: A memory module has a logic including a programming register, a deduplication ratio control logic, and a deduplication engine. The programming register stores a maximum deduplication ratio of the memory module. The control logic is configured to control a deduplication ratio of the memory module according to the maximum deduplication ratio. The deduplication ratio is programmable by the host computer.

9.

发明申请
HBM SILICON PHOTONIC TSV ARCHITECTURE FOR LOOKUP COMPUTING AI ACCELERATOR 有权

公开(公告)号：US20220367412A1

公开(公告)日：2022-11-17

申请号：US17873120

申请日：2022-07-25

Applicant: Samsung Electronics Co., Ltd.

Inventor： Peng GU , Krishna MALLADI , Hongzhong ZHENG

IPC: H01L25/065 , H01L31/12 , H01L31/02 , H01L31/0232 , H01L25/18 , H04B10/80 , H04Q11/00 , G02F1/01

Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.

10.

发明申请
DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING 审中-公开

公开(公告)号：US20200184001A1

公开(公告)日：2020-06-11

申请号：US16388860

申请日：2019-04-18

Applicant: Samsung Electronics Co., Ltd.

Inventor： Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU

IPC: G06F17/16 , G06F12/0877

Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification