Scalable in-network computation for massively-parallel shared-memory processors

    公开(公告)号:US11463272B2

    公开(公告)日:2022-10-04

    申请号:US17495547

    申请日:2021-10-06

    摘要: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

    TECHNIQUES FOR REDUCING CONGESTION IN A COMPUTER NETWORK

    公开(公告)号:US20190297018A1

    公开(公告)日:2019-09-26

    申请号:US16277349

    申请日:2019-02-15

    摘要: Multiple processors are often used in computing systems to solve very large, complex problems, such as those encountered in artificial intelligence. Such processors typically exchange data among each other via an interconnect fabric (such as, e.g., a group of network connections and switches) in solving such complex problems. The amount of data injected into the interconnect fabric by the processors can at times overwhelm the interconnect fabric preventing some of the processors from communicating with each other. To address this problem, techniques are disclosed to enable, for example, processors that are connected to an interconnect fabric to coordinate and control the amount of data injected so that the interconnect fabric does not get overwhelmed.

    SCALABLE IN-NETWORK COMPUTATION FOR MASSIVELY-PARALLEL SHARED-MEMORY PROCESSORS

    公开(公告)号:US20220029845A1

    公开(公告)日:2022-01-27

    申请号:US17495547

    申请日:2021-10-06

    摘要: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

    INJECTION LIMITING AND WAVE SYNCHRONIZATION FOR SCALABLE IN-NETWORK COMPUTATION

    公开(公告)号:US20210036881A1

    公开(公告)日:2021-02-04

    申请号:US16938044

    申请日:2020-07-24

    摘要: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. An injection policy comprising the issuing of credits enables each endpoint to limit the amount of collective communication primitives injected into the network simultaneously to reduce network congestion caused by increased network traffic due to the multicast capability of the network devices.

    SCALABLE IN-NETWORK COMPUTATION FOR MASSIVELY-PARALLEL SHARED-MEMORY PROCESSORS

    公开(公告)号:US20210037107A1

    公开(公告)日:2021-02-04

    申请号:US16938097

    申请日:2020-07-24

    摘要: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.