Abstract:
Apparatus and methods implementing a hardware predictor for reducing performance inversions caused by intra-core data transfer during inter-core data transfer optimization for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache or mid-level cache (MLC) for each core and a shared L3 or last-level cache (LLC). A hardware predictor to monitor accesses to sample cache lines and, based on these accesses, adaptively control the enablement of cache line demotion instructions for proactively demoting cache lines from lower cache levels to higher cache levels, including demoting cache lines from L1 or L2 caches (MLC) to L3 cache (LLC).
Abstract:
Systems and methods for cache allocation with code and data prioritization. An example system may comprise: a cache; a processing core, operatively coupled to the cache; and a cache control logic, responsive to receiving a cache fill request comprising an identifier of a request type and an identifier of a class of service, to identify a subset of the cache corresponding to a capacity bit mask associated with the request type and the class of service.
Abstract:
In one embodiment, the present invention includes a method for receiving an interrupt from an accelerator, sending a resume signal directly to a small core responsive to the interrupt and providing a subset of an execution state of the large core to the first small core, and determining whether the small core can handle a request associated with the interrupt, and performing an operation corresponding to the request in the small core if the determination is in the affirmative, and otherwise providing the large core execution state and the resume signal to the large core. Other embodiments are described and claimed.
Abstract:
Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
Abstract:
Various approaches to efficiently allocating and utilizing hardware resources in data centers while maintaining compliance with a service level objective (SLO) specified for a computational workload is translated into a hardware-level SLO to facilitate direct enforcement by the hardware processor, e.g., using a feedback control loop or model-based mapping of the hardware-level SLO to allocations of microarchitecture resources of the processor. In some embodiments, a computational model of the hardware behavior under resource contention is used to predict the application performance (e.g., as measured in terms of the hardware-level SLO) to be expected under certain contention scenarios. Scheduling of workloads among the compute nodes within the data center may be based on such predictions. In further embodiments, configurations of microservices are optimized to minimize hardware resources while meeting a specified performance goal.
Abstract:
In accordance with some embodiments, a cloud service provider may operate a data center in a way that dynamically reallocates resources across nodes within the data center based on both utilization and service level agreements. In other words, the allocation of resources may be adjusted dynamically based on current conditions. The current conditions in the data center may be a function of the nature of all the current workloads. Instead of simply managing the workloads in a way to increase overall execution efficiency, the data center instead may manage the workload to achieve quality of service requirements for particular workloads according to service level agreements.
Abstract:
A processor of an aspect includes a decode unit to decode an aperture access instruction, and an execution unit coupled with the decode unit. The execution unit, in response to the aperture access instruction, is to read a host physical memory address, which is to be associated with an aperture that is to be in system memory, from an access protected structure, and access data within the aperture at a host physical memory address that is not to be obtained through address translation. Other processors are also disclosed, as are methods, systems, and machine-readable medium storing aperture access instructions.
Abstract:
In some examples, for process-to-process communication, such as in function linking, a virtual channel can be provisioned to provide virtual machine to virtual machine communications. In response to a transmit request from a source virtual machine, the virtual channel can cause a data copy from a source buffer associated with the source virtual machine without decryption or encryption. The virtual channel provisions a key identifier for the copied data. The destination virtual machine can receive an indication data is available and can cause the data to be decrypted using a key accessed using the key identifier and source address of the copied data. In addition, the data can be encrypted using a second, different key for storage in a destination buffer associated with the destination virtual machine. In some examples, the key identifier and source address is managed by the virtual channel and is not visible to virtual machine or hypervisor.
Abstract:
Examples include a computing system for receiving memory class of service parameters; setting performance monitoring configuration parameters, based at least in part on the memory class of service parameters, for use by a performance monitor of a memory controller to generate performance monitoring statistics by monitoring performance of one or more workloads by a plurality of processor cores based at least in part on the performance monitoring configuration parameters; receiving the performance monitoring statistics from the performance monitor; and generating, based at least in part on the performance monitoring statistics, a plurality of memory bandwidth settings to be applied by a memory bandwidth allocator to the plurality of processor cores to dynamically adjust priorities of memory bandwidth allocated for the one or more workloads to be processed by the plurality of processor cores.
Abstract:
An architecture to perform resource management among multiple network nodes and associated resources is disclosed. Example resource management techniques include those relating to: proactive reservation of edge computing resources; deadline-driven resource allocation; speculative edge QoS pre-allocation; and automatic QoS migration across edge computing nodes. In a specific example, a technique for service migration includes: identifying a service operated with computing resources in an edge computing system, involving computing capabilities for a connected edge device with an identified service level; identifying a mobility condition for the service, based on a change in network connectivity with the connected edge device; and performing a migration of the service to another edge computing system based on the identified mobility condition, to enable the service to be continued at the second edge computing apparatus to provide computing capabilities for the connected edge device with the identified service level.