NETWORK-RELATED PERFORMANCE FOR GPUS
    4.
    发明申请

    公开(公告)号:US20200034195A1

    公开(公告)日:2020-01-30

    申请号:US16049216

    申请日:2018-07-30

    IPC分类号: G06F9/48 G06F9/50 G06F9/54

    摘要: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.

    Processing Element-Centric All-to-All Communication

    公开(公告)号:US20240220336A1

    公开(公告)日:2024-07-04

    申请号:US18147081

    申请日:2022-12-28

    IPC分类号: G06F9/54 G06F9/50 G06F15/173

    摘要: In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data communication between respective processing elements of each of the clusters; a second stage of inter-cluster data exchange for all-to-all data communication between the clusters; and a third stage of intra-cluster data distribution to the respective processing elements of each of the clusters.