Abstract:
A network interface controller can be programmed to direct write received data to a memory buffer via either a host-to-device fabric or an accelerator fabric. For packets received that are to be written to a memory buffer associated with an accelerator device, the network interface controller can determine an address translation of a destination memory address of the received packet and determine whether to use a secondary head. If a translated address is available and a secondary head is to be used, a direct memory access (DMA) engine is used to copy a portion of the received packet via the accelerator fabric to a destination memory buffer associated with the address translation. Accordingly, copying a portion of the received packet through the host-to-device fabric and to a destination memory can be avoided and utilization of the host-to-device fabric can be reduced for accelerator bound traffic.
Abstract:
Methods, apparatus and software for implementing enhanced data center congestion management for non-TCP traffic. Non-congested transmit latencies are determined for transmission of packets or Ethernet frames along paths between source and destination end-end-nodes when congestion along the paths is not present or minimal. Transmit latencies are similarly measured along the same source-destination paths during ongoing operations during which traffic congestion may vary. Based on whether a difference between the transmit latency for a packet or frame and the non-congested transmit latency for the path exceeds a threshold, the path is marked as congested or not congested. A rate at which the non-TCP packets are transmitted along the path is then managed as function of a rate at which the path is marked as congested. In one implementation, non-TCP traffic is managed by mimicking a Data Center TCP technique, under which the congestion marking status of the path is substituted as an input to a DCTP algorithm in place of the normally-used ECN-Echo flag input. The congestion window output by the DCTCP algorithm is then used to manage the rate at which non-TCP packets to be forwarded via the path are transmitted from a source end-node.
Abstract:
In an embodiment of the present invention, a method includes partitioning a plurality of remote direct memory access context objects among a plurality of virtual functions, establishing a remote direct memory access connection between a first of the plurality of virtual functions, and migrating the remote direct memory access connection from the first of the plurality of virtual functions to a second of the plurality of virtual functions without disconnecting from the remote peer.
Abstract:
Generally, this disclosure relates to a method of flow control. The method may include determining a server load in response to a request from a client; selecting a type of credit based at least in part on server load; and sending a credit to the client based at least in part on server load, wherein server load corresponds to a utilization level of a server and wherein the credit corresponds to an amount of data that may be transferred between the server and the client and the credit is configured to decrease over time if the credit is unused by the client.
Abstract:
Methods, apparatus and software for implementing enhanced data center congestion management for non-TCP traffic. Non-congested transmit latencies are determined for transmission of packets or Ethernet frames along paths between source and destination end-end-nodes when congestion along the paths is not present or minimal. Transmit latencies are similarly measured along the same source-destination paths during ongoing operations during which traffic congestion may vary. Based on whether a difference between the transmit latency for a packet or frame and the non-congested transmit latency for the path exceeds a threshold, the path is marked as congested or not congested. A rate at which the non-TCP packets are transmitted along the path is then managed as function of a rate at which the path is marked as congested. In one implementation, non-TCP traffic is managed by mimicking a Data Center TCP technique, under which the congestion marking status of the path is substituted as an input to a DCTP algorithm in place of the normally-used ECN-Echo flag input. The congestion window output by the DCTCP algorithm is then used to manage the rate at which non-TCP packets to be forwarded via the path are transmitted from a source end-node.
Abstract:
An embodiment may include circuitry to facilitate, at least in part, a first network interface controller (NIC) in a client to be capable of accessing, via a second NIC in a server that is remote from the client and in a manner that is independent of an operating system environment in the server, at least one command interface of another controller of the server. The command interface may include at least one controller command queue. Such accessing may include writing at least one queue element to the at least one command queue to command the another controller to perform at least one operation associated with the another controller. The another controller may perform the at least one operation in response, at least in part, to the at least one queue element. Many alternatives, variations, and modifications are possible.
Abstract:
Methods and apparatus for software-controlled active-backup mode of link aggregation for RDMA and virtual functions. A Network Interface Controller (NIC) includes hardware implementing first and second physical functions (PFs) including transmit and receive resources to support data transfers via first and second ports. A bonding group is created with the first and second PFs. The first PF as an active PF and used for primary data transfers while implementing the second PF as a backup PF. On a link or port failure of the active PF, the bonding group is reconfigured to employ transmit and receive resources of the backup PF such that those resources are shared with the active PF. Data transfers are then performed using the shared resources of the active PF and the backup PF. Embodiments may support RDMA data transfers using PF bonding and the solution may be implemented in virtualized environments including virtual machines (VMs) in a manner transparent to the VMs.
Abstract:
Methods, apparatus and software for implementing enhanced data center congestion management for non-TCP traffic. Non-congested transit latencies are determined for transmission of packets or Ethernet frames along paths between source and destination end-end-nodes when congestion along the paths is not present or minimal. Transit latencies are similarly measured along the same source-destination paths during ongoing operations during which traffic congestion may vary. Based on whether a difference between the transit latency for a packet or frame and the non-congested transit latency for the path exceeds a threshold, the path is marked as congested or not congested. A rate at which the non-TCP packets are transmitted along the path is then managed as function of a rate at which the path is marked as congested. In one implementation, non-TCP traffic is managed by mimicking a Data Center TCP technique, under which the congestion marking status of the path is substituted as an input to a DCTP algorithm in place of the normally-used ECN-Echo flag input. The congestion window output by the DCTCP algorithm is then used to manage the rate at which non-TCP packets to be forwarded via the path are transmitted from a source end-node.
Abstract:
Apparatus, method and system for supporting Remote Direct Memory Access (RDMA) Read V2 Request and Response messages using the Internet Wide Area RDMA Protocol (iWARP). iWARP logic in an RDMA Network Interface Controller (RNIC) is configured to generate a new RDMA Read V2 Request message and generate a new RDMA Read V2 Response message in response to a received RDMA Read V2 Request message, and send the messages to an RDMA remote peer using iWARP implemented over an Ethernet network. The iWARP logic is further configured to process RDMA Read V2 Response messages received from the RDMA remote peer, and to write data contained in the messages to appropriate locations using DMA transfers from buffers on the RNIC into system memory. In addition, the new semantics removes the need for extra operations to grant and revoke remote access rights.
Abstract:
Apparatus, methods and systems for supporting Send with Immediate Data messages using Remote Direct Memory Access (RDMA) and the Internet Wide Area RDMA Protocol (iWARP). iWARP logic in an RDMA Network Interface Controller (RNIC) is configured to generate different types of Send with Immediate Data messages, each including a header with a unique RDMA opcode identifying the type of Send with Immediate Data message, and send the message to an RDMA remote peer using iWARP implemented over an Ethernet network. The iWARP logic is further configured to process the Send with Immediate Data messages received from the RDMA remote peer. The Send with Immediate Data messages include a Send with Immediate Data message, a Send with Invalidate and Immediate Data message, a Send with Solicited Event (SE) and Immediate Data message, and a Send with Invalidate and SE and Immediate Data message.