-
公开(公告)号:US12248788B2
公开(公告)日:2025-03-11
申请号:US17691690
申请日:2022-03-10
Applicant: NVIDIA Corporation
Inventor: Prakash Bangalore Prabhakar , Gentaro Hirota , Ronny Krashinsky , Ze Long , Brian Pharris , Rajballav Dash , Jeff Tuckey , Jerome F. Duluk, Jr. , Lacky Shah , Luke Durant , Jack Choquette , Eric Werness , Naman Govil , Manan Patel , Shayani Deb , Sandeep Navada , John Edmondson , Greg Palmer , Wish Gandhi , Ravi Manyam , Apoorv Parle , Olivier Giroux , Shirish Gadre , Steve Heinrich
Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.
-
公开(公告)号:US12020035B2
公开(公告)日:2024-06-25
申请号:US17691288
申请日:2022-03-10
Applicant: NVIDIA Corporation
Inventor: Apoorv Parle , Ronny Krashinsky , John Edmondson , Jack Choquette , Shirish Gadre , Steve Heinrich , Manan Patel , Prakash Bangalore Prabhakar, Jr. , Ravi Manyam , Wish Gandhi , Lacky Shah , Alexander L. Minkin
IPC: G06F5/06 , G06F9/38 , G06F9/48 , G06F9/52 , G06F13/16 , G06F13/40 , G06T1/20 , G06T1/60 , H04L49/101
CPC classification number: G06F9/3887 , G06F9/522 , G06F13/1689 , G06F13/4022 , G06T1/20 , G06T1/60 , H04L49/101
Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.
-
公开(公告)号:US11966480B2
公开(公告)日:2024-04-23
申请号:US17654355
申请日:2022-03-10
Applicant: NVIDIA Corporation
Inventor: Adam Hendrickson , Vaishali Kulkarni , Gobikrishna Dhanuskodi , Naveen Cherukuri , Wish Gandhi , Raymond Wong
CPC classification number: G06F21/602 , G06F13/1673 , G06F13/28 , G06F21/79 , G06N3/04 , H04L9/0637 , H04L9/0643 , G06F21/107
Abstract: Apparatuses, systems, and techniques for supporting fairness of multiple context sharing cryptographic hardware. An accelerator circuit includes a copy engine (CE) with AES-GCM hardware configured to perform both encryption and authentication of data transfers for multiple applications or multiple data streams in a single application or belonging to a single user. The CE splits a data transfer of a specified size into a set of partial transfers. The CE sequentially executes the set of partial transfers using a context for a period of time (e.g., a timeslice) for an application. The CE stores in a secure memory for the application one or more data for encryption or decryption (e.g., a hash key, a block counter, etc.) computed from a last partial transfer. The one or more data for encryption or decryption are retrieved and used when data transfers for the application is resumed by the CE.
-
公开(公告)号:US20230289453A1
公开(公告)日:2023-09-14
申请号:US17654355
申请日:2022-03-10
Applicant: NVIDIA Corporation
Inventor: Adam Hendrickson , Vaishali Kulkarni , Gobikrishna Dhanuskodi , Naveen Cherukuri , Wish Gandhi , Raymond Wong
CPC classification number: G06F21/602 , G06F21/79 , H04L9/0637 , H04L9/0643 , G06F13/1673 , G06F13/28 , G06N3/04 , G06F2221/0751
Abstract: Apparatuses, systems, and techniques for supporting fairness of multiple context sharing cryptographic hardware. An accelerator circuit includes a copy engine (CE) with AES-GCM hardware configured to perform both encryption and authentication of data transfers for multiple applications or multiple data streams in a single application or belonging to a single user. The CE splits a data transfer of a specified size into a set of partial transfers. The CE sequentially executes the set of partial transfers using a context for a period of time (e.g., a timeslice) for an application. The CE stores in a secure memory for the application one or more data for encryption or decryption (e.g., a hash key, a block counter, etc.) computed from a last partial transfer. The one or more data for encryption or decryption are retrieved and used when data transfers for the application is resumed by the CE.
-
公开(公告)号:US11698869B1
公开(公告)日:2023-07-11
申请号:US17654359
申请日:2022-03-10
Applicant: NVIDIA Corporation
Inventor: Vaishali Kulkarni , Naveen Cherukuri , Raymond Wong , Adam Hendrickson , Gobikrishna Dhanuskodi , Wish Gandhi
CPC classification number: G06F12/1408 , G06F12/1441 , G06F12/1458 , G06F13/1673 , G06F13/28 , G06N3/04 , H04L9/0637 , H04L9/0643
Abstract: The subject application relates to computing an authentication tag for partial transfers scheduled across multiple direct memory access (DMA) engines. Apparatuses, systems, and techniques are described for computing an authentication tag for a data transfer when the data transfer is scheduled as partial transfers across a specified number of direct memory access (DMA) engines. An orchestration circuit stores partial authentication tags, computed by the DMA engines, and corresponding adjustment exponents during one or more rounds in which the partial transfers are scheduled and processed by the specified number of DMA engines. During a last round, a combined authentication tag can be computed based on the partial authentication tags and the corresponding adjustment exponents stored by the orchestration circuit during the rounds.
-
-
-
-