-
公开(公告)号:US12248788B2
公开(公告)日:2025-03-11
申请号:US17691690
申请日:2022-03-10
Applicant: NVIDIA Corporation
Inventor: Prakash Bangalore Prabhakar , Gentaro Hirota , Ronny Krashinsky , Ze Long , Brian Pharris , Rajballav Dash , Jeff Tuckey , Jerome F. Duluk, Jr. , Lacky Shah , Luke Durant , Jack Choquette , Eric Werness , Naman Govil , Manan Patel , Shayani Deb , Sandeep Navada , John Edmondson , Greg Palmer , Wish Gandhi , Ravi Manyam , Apoorv Parle , Olivier Giroux , Shirish Gadre , Steve Heinrich
Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.
-
公开(公告)号:US11367160B2
公开(公告)日:2022-06-21
申请号:US16053341
申请日:2018-08-02
Applicant: NVIDIA Corporation
Inventor: Rajballav Dash , Gregory Palmer , Gentaro Hirota , Lacky Shah , Jack Choquette , Emmett Kilgariff , Sriharsha Niverty , Milton Lei , Shirish Gadre , Omkar Paranjape , Lei Yang , Rouslan Dimitrov
Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
-
公开(公告)号:US10725837B1
公开(公告)日:2020-07-28
申请号:US16677503
申请日:2019-11-07
Applicant: NVIDIA Corporation
Inventor: Rajballav Dash , Jack H. Choquette , Ming Liang Milton Lei , Stephen Jones , Christopher Frederick Lamb
Abstract: Techniques are disclosed for sharing of data exchange among kernels (each a set of instructions) executing on a system having multiple processing units. In an embodiment, each processing unit includes an on-chip scratchpad memory that can be accessed by the kernels executing on the processing unit. All or a portion of the scratchpad memory can be allocated and configured, for example, such that the scratchpad is accessible to multiple kernels in parallel, to one or more kernels in serial, or a combination of both.
-
-