摘要:
A zombie server can be detected. Detecting a zombie server can include receiving, at a server, network traffic and calculating a percentage of the network traffic as being productivity software layer 7 protocols every first time interval. Detecting a zombie server can also include marking the server as a zombie server based on the percentage every second time interval and processing the network traffic at the server to perform a number of actions by the productivity software.
摘要:
A method of assessing energy efficiency of a High-performance computing (HPC) system, including: selecting a plurality of HPC workloads to run on a system under test (SUT) with one or more power constraints, wherein the SUT includes a plurality of HPC nodes in the HPC system, executing the plurality of HPC workloads on the SUT, and generating a benchmark metric for the SUT based on a baseline configuration for each selected HPC workload and a plurality of measured performance per power values for each executed workload at each selected power constraint is shown.
摘要:
A method of assessing energy efficiency of a High-performance computing (HPC) system, including: selecting a plurality of HPC workloads to run on a system under test (SUT) with one or more power constraints, wherein the SUT includes a plurality of HPC nodes in the HPC system, executing the plurality of HPC workloads on the SUT, and generating a benchmark metric for the SUT based on a baseline configuration for each selected HPC workload and a plurality of measured performance per power values for each executed workload at each selected power constraint is shown.
摘要:
A zombie server can be detected. Detecting a zombie server can include labeling a plurality of processes as utility software, calculating a utilization of utility software on the plurality of processes executed in one or more processing resources during an interval of time, and calculating a server utilization of the one or more processing resources during the interval of time. Detecting the zombie server can also include determining whether a difference between the utilization of utility software and the server utilization is greater than a threshold, and identifying a server that hosts the processing resource as a zombie server based on a determination that the difference is smaller than the threshold.
摘要:
A non-transitory computer readable storage medium storing instructions executable by one or more processors of a distributed computer system to perform operations including determining whether a power consumed by the distributed computer system is greater than a power allocated to the distributed computer system, responsive to determining the power consumed by the distributed computer system is greater than the power allocated to the distributed computer system, determining whether all jobs being processed by the distributed computer system are processing at a lowest power state for each job, wherein a job includes one or more calculations performed by the one or more processors of the distributed computer system and responsive to determining all jobs being processed by the distributed computer system are processing at a lowest power state for each job, suspending a job having a lowest priority among all jobs being processed by the distributed computer system is shown.
摘要:
A zombie server can be detected. Detecting a zombie server can include labeling a plurality of processes as utility software, calculating a utilization of utility software on the plurality of processes executed in one or more processing resources during an interval of time, and calculating a server utilization of the one or more processing resources during the interval of time. Detecting the zombie server can also include determining whether a difference between the utilization of utility software and the server utilization is greater than a threshold, and identifying a server that hosts the processing resource as a zombie server based on a determination that the difference is smaller than the threshold.
摘要:
A non-transitory computer readable storage medium having stored thereon instructions, the instructions being executable by one or more processors to perform operations including: receiving, by a calibration module executed by the one or more processors, a calibration request including (i) a workload type, (ii) a list of compute nodes belonging to a distributed computer system, and (iii) one or more frequencies; responsive to identifying the workload type as a clustered workload type, instructing a plurality of compute nodes on the list of compute nodes to begin processing a workload of the workload type; and responsive to identifying the workload type as a clustered workload type, instructing a compute node on the list of compute nodes to begin processing the workload of the workload type is shown.
摘要:
A non-transitory computer readable storage medium storing instructions executable by one or more processors of a distributed computer system to perform operations including determining whether a power consumed by the distributed computer system is greater than a power allocated to the distributed computer system, responsive to determining the power consumed by the distributed computer system is greater than the power allocated to the distributed computer system, determining whether all jobs being processed by the distributed computer system are processing at a lowest power state for each job, wherein a job includes one or more calculations performed by the one or more processors of the distributed computer system and responsive to determining all jobs being processed by the distributed computer system are processing at a lowest power state for each job, suspending a job having a lowest priority among all jobs being processed by the distributed computer system is shown.
摘要:
A non-transitory computer readable storage medium having stored thereon instructions, the instructions being executable by one or more processors to perform operations including: receiving, by a calibration module executed by the one or more processors, a calibration request including (i) a workload type, (ii) a list of compute nodes belonging to a distributed computer system, and (iii) one or more frequencies; responsive to identifying the workload type as a clustered workload type, instructing a plurality of compute nodes on the list of compute nodes to begin processing a workload of the workload type; and responsive to identifying the workload type as a clustered workload type, instructing a compute node on the list of compute nodes to begin processing the workload of the workload type is shown.