Patent search ap:("NVIDIA Corporation") AND inv:"Peter NELSON" Page 1

1.

发明申请
THREAD-LEVEL SLEEP IN A MASSIVELY MULTITHREADED ARCHITECTURE 审中-公开

公开(公告)号：US20180314522A1

公开(公告)日：2018-11-01

申请号：US15582549

申请日：2017-04-28

Applicant: NVIDIA Corporation

Inventor： Olivier GIROUX , Peter NELSON , Jack CHOQUETTE , Ajay Sudarshan TIRUMALA

IPC: G06F9/30 , G06F9/38

Abstract: A streaming multiprocessor (SM) includes a nanosleep (NS) unit configured to cause individual threads executing on the SM to sleep for a programmer-specified interval of time. For a given thread, the NS unit parses a NANOSLEEP instruction and extracts a sleep time. The NS unit then maps the sleep time to a single bit of a timer and causes the thread to sleep. When the timer bit changes, the sleep time expires, and the NS unit awakens the thread. The thread may then continue executing. The SM also includes a nanotrap (NT) unit configured to issue traps using a similar timing mechanism to that described above. For a given thread, the NT unit parses a NANOTRAP instruction and extracts a trap time. The NT unit then maps the trap time to a single bit of a timer. When the timer bit changes, the NT unit issues a trap.

2.

发明申请
TECHNIQUES FOR EFFICIENTLY SYNCHRONIZING MULTIPLE PROGRAM THREADS 有权

公开(公告)号：US20220391264A1

公开(公告)日：2022-12-08

申请号：US17338377

申请日：2021-06-03

Applicant: NVIDIA CORPORATION

Inventor： Ajay Sudarshan TIRUMALA , Olivier GIROUX , Peter NELSON , Gary M. TAROLLI , Ankita UPRETI

IPC: G06F9/52 , G06F9/30

Abstract: Various embodiments include a parallel processing computer system that enables parallel instances of a program to synchronize at disparate addresses in memory. When the parallel program instances need to exchange data, the program instances synchronize based on a mask that identifies the program instances that are synchronizing. As each program instance reaches the point of synchronization, the program instance blocks and waits for all other program instances to reach the point of synchronization. When all program instances have reached the point of synchronization, at least one program instance executes a synchronous operation to exchange data. The program instances then continue execution at respective and disparate return addresses.

3.

发明申请
TECHNIQUES FOR EFFICIENTLY PERFORMING DATA REDUCTIONS IN PARALLEL PROCESSING UNITS 有权

公开(公告)号：US20210019198A1

公开(公告)日：2021-01-21

申请号：US16513393

申请日：2019-07-16

Applicant: NVIDIA CORPORATION

Inventor： Peter NELSON , Olivier GIROUX , Ajay Sudarshan TIRUMALA

IPC: G06F9/52

Abstract: Techniques are disclosed for reducing the latency associated with performing data reductions in a multithreaded processor. In response to a single instruction associated with a set of threads executing in the multithreaded processor, a warp reduction unit acquires register values stored in source registers, where each register value is associated with a different thread included in the set of threads. The warp reduction unit performs operation(s) on the register values to compute an aggregate value. The warp reduction unit stores the aggregate value in a destination register that is accessible to at least one of the threads in the set of threads. Because the data reduction is performed via a single instruction using hardware specialized for data reductions, the number of cycles required to perform the data reduction is decreased relative to prior-art techniques that are performed via multiple instructions using hardware that is not specialized for data reductions.

4.

发明申请
METHOD FOR FORWARD PROGRESS AND PROGRAMMABLE TIMEOUTS OF TREE TRAVERSAL MECHANISMS IN HARDWARE 审中-公开

公开(公告)号：US20200051318A1

公开(公告)日：2020-02-13

申请号：US16101232

申请日：2018-08-10

Applicant: NVIDIA Corporation

Inventor： Greg MUTHLER , Ronald Charles BABICH, JR. , William Parsons NEWHALL, JR. , Peter NELSON , James ROBERTSON , John BURGESS

IPC: G06T15/06 , G06F9/38 , G06T17/00 , G06N5/04 , G06T1/20 , G06T1/60

Abstract: In a ray tracer, to prevent any long-running query from hanging the graphics processing unit, a traversal coprocessor provides a preemption mechanism that will allow rays to stop processing or time out early. The example non-limiting implementations described herein provide such a preemption mechanism, including a forward progress guarantee, and additional programmable timeout options that can be time or cycle based. Those programmable options provide a means for quality of service timing guarantees for applications such as virtual reality (VR) that have strict timing requirements.

5.

发明公开
WATERTIGHT RAY TRIANGLE INTERSECTION 审中-公开

公开(公告)号：US20240355039A1

公开(公告)日：2024-10-24

申请号：US18761820

申请日：2024-07-02

Applicant: NVIDIA Corporation

Inventor： Samuli LAINE , Tero KARRAS , Timo AILA , Robert OHANNESSIAN , William Parsons NEWHALL, Jr. , Greg MUTHLER , Ian KWONG , Peter NELSON , John BURGESS

IPC: G06T15/06 , G06T15/00

CPC classification number: G06T15/06 , G06T15/005

Abstract: A hardware-based traversal coprocessor provides acceleration of tree traversal operations searching for intersections between primitives represented in a tree data structure and a ray. The primitives may include triangles used in generating a virtual scene. The hardware-based traversal coprocessor is configured to properly handle numerically challenging computations at or near edges and/or vertices of primitives and/or ensure that a single intersection is reported when a ray intersects a surface formed by primitives at or near edges and/or vertices of the primitives.

6.

发明公开
HARDWARE ACCELERATED SYNCHRONIZATION WITH ASYNCHRONOUS TRANSACTION SUPPORT 审中-公开

公开(公告)号：US20230289242A1

公开(公告)日：2023-09-14

申请号：US17691296

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Timothy GUO , Jack CHOQUETTE , Shirish GADRE , Olivier GIROUX , Carter EDWARDS , John EDMONDSON , Manan PATEL , Raghavan MADHAVAN, JR. , Jessie HUANG , Peter NELSON , Ronny KRASHINSKY

IPC: G06F9/52

CPC classification number: G06F9/522 , G06F2209/521

Abstract: A new transaction barrier synchronization primitive enables executing threads and asynchronous transactions to synchronize across parallel processors. The asynchronous transactions may include transactions resulting from, for example, hardware data movement units such as direct memory units, etc. A hardware synchronization circuit may provide for the synchronization primitive to be stored in a cache memory so that barrier operations may be accelerated by the circuit. A new wait mechanism reduces software overhead associated with waiting on a barrier.

7.

发明公开
METHOD FOR FORWARD PROGRESS AND PROGRAMMABLE TIMEOUTS OF TREE TRAVERSAL MECHANISMS IN HARDWARE 审中-公开

公开(公告)号：US20240169655A1

公开(公告)日：2024-05-23

申请号：US18420449

申请日：2024-01-23

Applicant: NVIDIA Corporation

Inventor： Greg MUTHLER , Ronald Charles BABICH, JR. , William Parsons NEWHALL, Jr. , Peter NELSON , James ROBERTSON , John BURGESS

IPC: G06T15/06 , G06F9/38 , G06N5/046 , G06T1/20 , G06T1/60 , G06T17/00

CPC classification number: G06T15/06 , G06F9/3877 , G06N5/046 , G06T1/20 , G06T1/60 , G06T17/005

Abstract: In a ray tracer, to prevent any long-running query from hanging the graphics processing unit, a traversal coprocessor provides a preemption mechanism that will allow rays to stop processing or time out early. The example non-limiting implementations described herein provide such a preemption mechanism, including a forward progress guarantee, and additional programmable timeout options that can be time or cycle based. Those programmable options provide a means for quality of service timing guarantees for applications such as virtual reality (VR) that have strict timing requirements.

8.

发明申请
METHOD FOR FORWARD PROGRESS AND PROGRAMMABLE TIMEOUTS OF TREE TRAVERSAL MECHANISMS IN HARDWARE 审中-公开

公开(公告)号：US20200051317A1

公开(公告)日：2020-02-13

申请号：US16101206

申请日：2018-08-10

Applicant: NVIDIA Corporation

Inventor： Greg MUTHLER , Ronald Charles BABICH, JR. , William Parsons NEWHALL, JR. , Peter NELSON , Jim ROBERTSON , John BURGESS

IPC: G06T15/06 , G06T17/00 , G06N5/04 , G06F9/38 , G06T15/00 , G06T1/20 , G06T1/60

Abstract: In a ray tracer, to prevent any long-running query from hanging the graphics processing unit, a traversal coprocessor provides a preemption mechanism that will allow rays to stop processing or time out early. The example non-limiting implementations described herein provide such a preemption mechanism, including a forward progress guarantee, and additional programmable timeout options that can be time or cycle based. Those programmable options provide a means for quality of service timing guarantees for applications such as virtual reality (VR) that have strict timing requirements.

9.

发明申请
TECHNIQUE FOR REDUCING VOLTAGE DROOP BY THROTTLING INSTRUCTION ISSUE RATE 审中-公开
Title translation: 通过指导性发电速率降低电压的技术

公开(公告)号：US20150089198A1

公开(公告)日：2015-03-26

申请号：US14033378

申请日：2013-09-20

Applicant: NVIDIA CORPORATION

Inventor： Peter SOMMERS , Peter NELSON , Aniket NAIK , John H. EDMONDSON

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3836

Abstract: An issue control unit is configured to control the rate at which an instruction issue unit issues instructions to an execution pipeline in order to avoid spikes in power drawn by that execution pipeline. The issue control unit maintains a history buffer that reflects, for N previous cycles, the number of instructions issued during each of those N cycles. If the total number of instructions issued during the N previous cycles exceeds a threshold value, then the issue control unit throttles the instruction issue unit from issuing instructions during a subsequent cycle. In addition, the issue control unit increases the threshold value in proportion to the number of previously issued instructions and based on a variety of configurable parameters. Accordingly, the issue control unit maintains granular control over the rate with which the instruction issue unit “ramps up” to a maximum instruction issue rate.

Abstract translation: 问题控制单元被配置为控制指令发布单元向执行流水线发出指令的速率，以避免该执行流水线所绘制的功率尖峰。问题控制单元保持历史缓冲器，其反映在N个先前循环中在这N个周期中的每一个期间发出的指令的数量。如果在N个先前循环中发出的指令的总数超过阈值，则发布控制单元在随后的周期期间阻止指令发出单元发出指令。此外，问题控制单元根据先前发布的指令的数量并且基于各种可配置参数来增加阈值。因此，问题控制单元对指令发布单元“上升”到最大指令发布速率的速率进行细粒度控制。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification