-
公开(公告)号:US11956306B1
公开(公告)日:2024-04-09
申请号:US17709111
申请日:2022-03-30
申请人: NVIDIA Corporation
发明人: Glenn Dearth , Mark Hummel , Nan Jiang , Gregory Thorson
IPC分类号: H04L47/70 , H04L47/80 , H04L67/1008 , H04L67/1014
CPC分类号: H04L67/1008 , H04L47/806 , H04L47/827 , H04L67/1014
摘要: Systems and techniques for performing multicast-reduction operations. In at least one embodiment, a network device receives first network data associated with a multicast operation to be collectively performed by at least a plurality of endpoints. The network device reserves resources to process second network data to be received from the endpoints, and sends the first network data to a plurality of additional network devices. The network device receives the second network data, and processes the second network data using the reserved resources.
-
公开(公告)号:US10789194B2
公开(公告)日:2020-09-29
申请号:US16364565
申请日:2019-03-26
申请人: NVIDIA Corporation
发明人: Larry R. Dennison , Mark Hummel , Glenn Dearth
IPC分类号: G06F13/40 , G06F3/06 , G06F13/00 , G06F9/38 , G06F12/0891
摘要: Systems and techniques for synchronizing transactions between processing devices on an interconnection network are provided. Upon receiving a stream of posted transactions followed by a flush transaction from a source processing device connected to the interconnection network, the flush transaction is trapped before it enters the interconnecting network. Subsequently, based on monitoring for responses received from a destination processing device for transactions corresponding to the posted transactions, a flush response is generated and returned to the source processing device. The described techniques enable efficient synchronizing posted writes, posted atomics and the like over complex interconnection fabrics such that a first GPU can write data to a second GPU so that a third GPU can safely consume the data written to the second GPU.
-
公开(公告)号:US11863390B1
公开(公告)日:2024-01-02
申请号:US17888999
申请日:2022-08-16
申请人: Nvidia Corporation
发明人: Miriam Menes , Eitan Zahavi , Gil Bloch , Ahmad Atamli , Meni Orenbach , Mark Hummel , Glenn Dearth
IPC分类号: G06F15/177 , H04L41/0873 , H04L45/488
CPC分类号: H04L41/0873 , H04L45/488
摘要: Apparatuses, systems, and techniques are presented to configure computing resources to perform various tasks. In at least one embodiment, an approach presented herein can be used to verify whether a network of computing nodes is properly configured based, at least in part, on one or more expected data strings generated by the network of computing nodes.
-
公开(公告)号:US20230224239A1
公开(公告)日:2023-07-13
申请号:US17575354
申请日:2022-01-13
申请人: NVIDIA Corporation
发明人: Glenn Dearth , Nan Jiang , Mark Hummel , Richard Reeves
IPC分类号: H04L45/16 , H04L12/18 , H04L45/745
CPC分类号: H04L45/16 , H04L12/18 , H04L45/745
摘要: Apparatuses, systems, and techniques to multicast a transaction to a group of targets. In at least one embodiment, a set is selected from alternate sets of directives associated with the group of targets, and the transaction is transmitted to the group of targets in accordance with the selected set.
-
公开(公告)号:US11038800B2
公开(公告)日:2021-06-15
申请号:US16553511
申请日:2019-08-28
申请人: Nvidia Corporation
发明人: Glenn Dearth , Mark Hummel , Jonathan Owen , Mike Osborn , John Wortman , Rich Reeves
IPC分类号: H04L12/801 , H04L7/00 , H04L12/743 , H04L12/947 , G06F13/16 , G06F13/30 , G06F13/42
摘要: An endpoint in a network may make posted or non-posted write requests to another endpoint in the network. For a non-posted write request, the target endpoint provides a response to the requesting endpoint indicating that the write request has been serviced. For a posted write request, the target endpoint does not provide such an acknowledgment. Hence, posted write requests have lower overhead, but they suffer from potential synchronization and resiliency issues. While non-posted write requests do not have those issues, they cause increased load on the network because such requests require the target endpoint to acknowledge each write request. Introduced herein is a network operation technique that uses non-posted transactions while maintaining a load overhead of the network as a manageable level. The introduced technique reduces the load overhead of the non-posted write requests by collapsing and reducing a number of the responses.
-
公开(公告)号:US20190297018A1
公开(公告)日:2019-09-26
申请号:US16277349
申请日:2019-02-15
申请人: Nvidia Corporation
发明人: Glenn Dearth , Nan Jiang , John Wortman , Alex Ishii , Mark Hummel , Rich Reeves
IPC分类号: H04L12/801 , H04L12/825 , H04L12/26
摘要: Multiple processors are often used in computing systems to solve very large, complex problems, such as those encountered in artificial intelligence. Such processors typically exchange data among each other via an interconnect fabric (such as, e.g., a group of network connections and switches) in solving such complex problems. The amount of data injected into the interconnect fabric by the processors can at times overwhelm the interconnect fabric preventing some of the processors from communicating with each other. To address this problem, techniques are disclosed to enable, for example, processors that are connected to an interconnect fabric to coordinate and control the amount of data injected so that the interconnect fabric does not get overwhelmed.
-
公开(公告)号:US20240137410A1
公开(公告)日:2024-04-25
申请号:US18545339
申请日:2023-12-19
申请人: NVIDIA Corporation
发明人: Glenn Dearth , Mark Hummel , Nan Jiang , Gregory Thorson
IPC分类号: H04L67/1008 , H04L47/70 , H04L47/80 , H04L67/1014
CPC分类号: H04L67/1008 , H04L47/806 , H04L47/827 , H04L67/1014
摘要: Systems and techniques for performing multicast-reduction operations. In at least one embodiment, a network device receives first network data associated with a multicast operation to be collectively performed by at least a plurality of endpoints. The network device reserves resources to process second network data to be received from the endpoints, and sends the first network data to a plurality of additional network devices. The network device receives the second network data, and processes the second network data using the reserved resources.
-
公开(公告)号:US11822491B2
公开(公告)日:2023-11-21
申请号:US17506438
申请日:2021-10-20
申请人: NVIDIA Corporation
发明人: John Feehrer , Denis Foley , Mark Hummel , Vyas Venkataraman , Ram Gummadi , Samuel H. Duncan , Glenn Dearth , Brian Kelleher
CPC分类号: G06F13/1652 , G06F9/45558 , G06F12/1027 , G06F13/1668 , G06F13/4022 , G06F17/16 , G06N20/00 , G06F2009/45583
摘要: Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.
-
公开(公告)号:US20210014156A1
公开(公告)日:2021-01-14
申请号:US16700611
申请日:2019-12-02
申请人: Nvidia Corporation
发明人: Glenn Dearth , Mark Hummel
IPC分类号: H04L12/709 , H04L12/721 , H04L12/751 , H04L12/707
摘要: Introduced herein is a routing technique that, for example, routes a transaction to a destination port over a network that supports link aggregation and multi-port connection. In one embodiment, two tables that can be searched based on the target and supplemental routing IDs of the transaction are utilized to route the transaction to the proper port of the destination endpoint. In an embodiment, the first table provides a list of available ports at each hop/route point that can route the transaction to the destination endpoint, and the second table provides a supplemental routing ID that can select a specific group of ports from the first table that can correctly route the transaction to the proper port.
-
公开(公告)号:US20240098139A1
公开(公告)日:2024-03-21
申请号:US17709111
申请日:2022-03-30
申请人: NVIDIA Corporation
发明人: Glenn Dearth , Mark Hummel , Nan Jiang , Gregory Thorson
IPC分类号: H04L67/1008 , H04L67/1012 , H04L67/1014
CPC分类号: H04L67/1008 , H04L67/1012 , H04L67/1014
摘要: Systems and techniques for performing multicast-reduction operations. In at least one embodiment, a network device receives first network data associated with a multicast operation to be collectively performed by at least a plurality of endpoints. The network device reserves resources to process second network data to be received from the endpoints, and sends the first network data to a plurality of additional network devices. The network device receives the second network data, and processes the second network data using the reserved resources.
-
-
-
-
-
-
-
-
-