Abstract:
Systems and methods for cache allocation with code and data prioritization. An example system may comprise: a cache; a processing core, operatively coupled to the cache; and a cache control logic, responsive to receiving a cache fill request comprising an identifier of a request type and an identifier of a class of service, to identify a subset of the cache corresponding to a capacity bit mask associated with the request type and the class of service.
Abstract:
Systems and methods for cache allocation with code and data prioritization. An example system may comprise: a cache; a processing core, operatively coupled to the cache; and a cache control logic, responsive to receiving a cache fill request comprising an identifier of a request type and an identifier of a class of service, to identify a subset of the cache corresponding to a capacity bit mask associated with the request type and the class of service.
Abstract:
Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.
Abstract:
Technologies for analyzing and optimizing workloads (e.g., virtual network functions) executing on edge resources are disclosed. According to one embodiment disclosed herein, a compute device launches a virtualized system including a virtual network function and a performance manager, the performance manager to monitor a current resource usage of the virtual network function as a function of a performance profile. The compute device determines, in response to a determination that one or more quality-of-service (QoS) requirements is not satisfied, whether one or more resources from the platform are available for satisfying the QoS requirements. The compute device receives, in response to a determination that the one or more resources are available for satisfying the QoS requirements, the one or more resources and updates the performance profile as a function of the received resources.
Abstract:
A holistic view of cache class of service (CLOS) to include an allocation of processor cache resources to a plurality of CLOS. The allocation of processor cache resources to include allocation of cache ways for an n-way set of associative cache. Examples include monitoring usage of the plurality of CLOS to determine processor cache resource usage and to report the processor cache resource usage.
Abstract:
A compute device includes one or more processors, one or more resources capable of being utilized by the one or more processors, and a platform interconnect to facilitate communication of messages between the one or more processors and the one or more resources. The compute device is to obtain class of service data for one or more workloads to be executed by the compute device. The class of service data is indicative of a capacity of one or more of the resources to be utilized in the execution of each corresponding workload. The compute device is also to execute the one or more workloads and manage the amount of traffic transmitted through the platform interconnect for each corresponding workload as a function of the class of service data as the one or more workloads are executed.
Abstract:
Devices and techniques for out-of-band platform tuning and configuration are described herein. A device can include a telemetry interface to a telemetry collection system and a network interface to network adapter hardware. The device can receive platform telemetry metrics from the telemetry collection system, and network adapter silicon hardware statistics over the network interface, to gather collected statistics. The device can apply a heuristic algorithm using the collected statistics to determine processing core workloads generated by operation of a plurality of software systems communicatively coupled to the device. The device can provide a reconfiguration message to instruct at least one software system to switch operations to a different processing core, responsive to detecting an overload state on at least one processing core, based on the processing core workloads. Other embodiments are also described.
Abstract:
Technologies for analyzing and optimizing workloads (e.g., virtual network functions) executing on edge resources are disclosed. According to one embodiment disclosed herein, a compute device launches a virtualized system including a virtual network function and a performance manager, the performance manager to monitor a current resource usage of the virtual network function as a function of a performance profile. The compute device determines, in response to a determination that one or more quality-of-service (QoS) requirements is not satisfied, whether one or more resources from the platform are available for satisfying the QoS requirements. The compute device receives, in response to a determination that the one or more resources are available for satisfying the QoS requirements, the one or more resources and updates the performance profile as a function of the received resources.
Abstract:
Devices and techniques for out-of-band platform tuning and configuration are described herein. A device can include a telemetry interface to a telemetry collection system and a network interface to network adapter hardware. The device can receive platform telemetry metrics from the telemetry collection system, and network adapter silicon hardware statistics over the network interface, to gather collected statistics. The device can apply a heuristic algorithm using the collected statistics to determine processing core workloads generated by operation of a plurality of software systems communicatively coupled to the device. The device can provide a reconfiguration message to instruct at least one software system to switch operations to a different processing core, responsive to detecting an overload state on at least one processing core, based on the processing core workloads. Other embodiments are also described.
Abstract:
Hardware processors and methods to perform self-monitoring diagnostics to predict and detect failure are described. In one embodiment, a hardware processor includes a plurality of cores, and a diagnostic hardware unit to isolate a core of the plurality of cores at run-time, perform a stress test on an isolated core, determine a stress factor from a result of the stress test, and store the stress factor in a data storage device.