Abstract:
Method and apparatus for efficient range-based memory writeback is described herein. One embodiment of an apparatus includes a system memory, a plurality of hardware processor cores each of which includes a first cache, a decoder circuitry to decode an instruction having fields for a first memory address and a range indicator, and an execution circuitry to execute the decoded instruction. Together, the first memory address and the range indicator define a contiguous region in the system memory that includes one or more cache lines. An execution of the decoded instruction causes any instances of the one or more cache lines in the first cache to be invalidated. Additionally, any invalidated instances of the one or more cache lines that are dirty are to be stored to system memory.
Abstract:
Technologies for identifying a cache line of a network packet for eviction from an on-processor cache of a network device communicatively coupled to a network controller. The network device is configured to determine whether a cache line of the cache corresponding to the network packet is to be evicted from the cache based on a determination that the network packet is not needed subsequent to processing the network packet, and provide an indication that the cache line is to be evicted from the cache based on an eviction policy received from the network controller.
Abstract:
Technologies for distributed table lookup via a distributed router includes an ingress computing node, an intermediate computing node, and an egress computing node. Each computing node of the distributed router includes a forwarding table to store a different set of network routing entries obtained from a routing table of the distributed router. The ingress computing node generates a hash key based on the destination address included in a received network packet. The hash key identifies the intermediate computing node of the distributed router that stores the forwarding table that includes a network routing entry corresponding to the destination address. The ingress computing node forwards the received network packet to the intermediate computing node for routing. The intermediate computing node receives the forwarded network packet, determines a destination address of the network packet, and determines the egress computing node for transmission of the network packet from the distributed router.
Abstract:
Apparatus and methods implementing a hardware predictor for reducing performance inversions caused by intra-core data transfer during inter-core data transfer optimization for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache or mid-level cache (MLC) for each core and a shared L3 or last-level cache (LLC). A hardware predictor to monitor accesses to sample cache lines and, based on these accesses, adaptively control the enablement of cache line demotion instructions for proactively demoting cache lines from lower cache levels to higher cache levels, including demoting cache lines from L1 or L2 caches (MLC) to L3 cache (LLC).
Abstract:
Technologies for identifying a cache line of a network packet for eviction from an on-processor cache of a network device communicatively coupled to a network controller. The network device is configured to determine whether a cache line of the cache corresponding to the network packet is to be evicted from the cache based on a determination that the network packet is not needed subsequent to processing the network packet, and provide an indication that the cache line is to be evicted from the cache based on an eviction policy received from the network controller.
Abstract:
Techniques for modifying a transmission rate of a device having a plurality of transmission rate options are described herein. The techniques include a method comprising receiving data from a sensor indicating movement of an electronic device, the electronic device having a plurality of transmission rate options. Fail ratio metrics are gathered. The fail ratio metrics indicate a ratio of failed transmissions to successful transmissions for rate option during device movement. The method includes determining whether a given rate option has a fail ratio above a predetermined threshold; and, if so, disabling the given rate option while the device is moving.
Abstract:
Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
Abstract:
An apparatus and method for closed loop dynamic resource allocation. For example, one embodiment of a method comprises: collecting data related to usage of a plurality of resources by a plurality of workloads over one or more time periods, the workloads including priority workloads associated with one or more guaranteed performance levels and best effort workloads not associated with guaranteed performance levels; analyzing the data to identify resource reallocations from one or more of the priority workloads to one or more of the best effort workloads in one or more subsequent time periods while still maintaining the guaranteed performance levels; reallocating the resources from the priority workloads to the best effort workloads for the subsequent time periods; monitoring execution of the priority workloads with respect to the guaranteed performance level during the subsequent time periods; and preemptively reallocating resources from the best effort workloads to the priority workloads during the subsequent time periods to ensure compliance with the guaranteed performance level and responsive to detecting that the guaranteed performance level is in danger of being breached.
Abstract:
Technologies for dynamically managing a batch size of packets include a network device. The network device is to receive, into a queue, packets from a remote node to be processed by the network device, determine a throughput provided by the network device while the packets are processed, determine whether the determined throughput satisfies a predefined condition, and adjust a batch size of packets in response to a determination that the determined throughput satisfies a predefined condition. The batch size is indicative of a threshold number of queued packets required to be present in the queue before the queued packets in the queue can be processed by the network device.
Abstract:
Examples described herein relates to a network interface apparatus that includes packet processing circuitry and a bus interface. In some examples, the packet processing circuitry to: process a received packet that includes data, a request to perform a write operation to write the data to a cache, and an indicator that the data is to be durable and based at least on the received packet including the request and the indicator, cause the data to be written to the cache and non-volatile memory. In some examples, the packet processing circuitry is to issue a command to an input output (IO) controller to cause the IO controller to write the data to the cache and the non-volatile memory. In some examples, the cache comprises one or more of: a level-0 (L0), level-1 (L1), level-2 (L2), or last level cache (LLC) and the non-volatile memory comprises one or more of: volatile memory that is part of an Asynchronous DRAM Refresh (ADR) domain, persistent memory, battery-backed memory, or memory device whose state is determinate even if power is interrupted to the memory device. In some examples, based on receipt of a second received packet that includes a request to persist data, the packet processing circuitry is to request that data stored in a memory buffer be copied to the non-volatile memory.