• No results found

CPU Host Load Analysis

Performance estimation and optimization of computing systems has attracted many researcher’s interest [64, 101, 102, 153]. Analysing and mining CPU host load of a computing system are very important for further optimization tasks. Much research has been conducted on CPU host load.

2.2.1

Statistical Features of CPU Host Load

In [43], Dinda analysed several statistical properties of CPU host load collected from a working cluster. The analysis found several statistical properties of CPU host load as follows:

1. The traces exhibit low means but very high standard deviations and maximums. This implies that these machines have plenty of cycles to spare to execute jobs, but the execution time of these jobs will vary drastically. 2. Absolute measures of variation are positively correlated with the mean

while relative measures are negatively correlated. This suggests that it may not be unreasonable to map tasks to heavily loaded machines under some performance metrics.

3. The traces have complex, rough, and often multimodal distributions that are not well fitted by analytic distributions such as the normal or expo- nential distributions, which are particularly inept at capturing the tail of the distribution. This implies that modelling and simulation that as-

sumes convenient analytical load distributions may be flawed. Trace-driven simulation may be preferable.

4. Load is strongly correlated over time but has a broad, almost noise- like frequency spectrum. This implies that history-based load prediction schemes are feasible, but that linear methods may have difficulty. Realistic load models should capture this dependence, or trace-driven simulation should be used.

5. The traces are self-similar with relatively high Hurst parameters. This means that load smoothing will decrease variance much more slowly than expected. It may be preferable to migrate tasks in the face of adverse load conditions instead of waiting for the adversity to be ameliorated over the long term. Self-similarity also suggests certain modelling approaches such as fractional ARIMA models and non-linear models which can capture the self-similarity property.

6. The traces display epochal behaviour in that the local frequency content of the load signal remains quite stable for long periods of time and changes abruptly at the boundaries of such epochs. This suggests that the problem of predicting load may be able to be decomposed into a sequence of smaller subproblems.

In [41], Di et al. analysed some statistical properties of CPU host load data in a Google cluster [149] and compared it to another grid system. According to their research, Google hosts maximum CPU load is often close to the CPU capacity, and the maximum memory usage is about 80% of the memory capacity. The maximum load is actually controlled in the Google system for guaranteeing the service level of requests in case of unexpected load spikes. In contrast, Grid resources can be highly utilized without having a high risk of losing users or customers. Meanwhile, CPU and memory usage changes every 6 minutes, indicating again the volatility of load. CPUs are often idle, but memory usage

is relatively high. CPU usage in Grids is higher and more stable. The noise of CPU load in the Google cluster is 20 times as high as that in Grids.

Zhang et al. [160] found that host load exhibits the extremely periodical patterns in short terms, while in a long period, the pattern of host load changes dramatically.

2.2.2

Mining and Prediction of CPU Host Load

Mining and predicting CPU host load can help improving the performance of a computing system in many ways and thus attracted much interest. Much research has been done on the characterization of workload. Various models have been proposed to characterise workload. Maria et al. [23] summarized the modelling as 5 steps: 1) formulation, 2) collection of the parameters, 3) statistical analyses of the measured data, and 4) representativeness.In [33], Cirne et al. proposed a comprehensive model for supercomputer workload. Different techniques of modelling workload to estimate performance of a computer system are proposed [7, 10, 36, 48].

Khan et al. [94] identified different types of CPU work load of virtual machines (VMs) of a cloud computing system. The authors applied a novel co-clustering analysis to group VMs with different host load changing patterns. Predictions of future system performance can be used to improve resource selection and scheduling [160]. Much work has been proposed on predicting CPU host load. Linear models are most used in predicting CPU host load. Specifically, these models are AR, MA, ARMA, ARIMA, and ARFIMA which are commonly used in predicting time series [21]. Peter A. Dinda tried to predict host load by using linear models [45], which reveals that host load is strongly autocorrelated. The author further evaluated all linear models for host load prediction in [44]. Their study strongly shows that host load on a real system is predictable to a very useful degree from pat behavior by using linear models. Zhang et al. [160, 161] applied a modified polynomial fitting to CPU host load data with consideration of sudden changes of CPU host load to reduce the error rate of prediction.

Beside linear models, other prediction techniques are also applied to CPU host load. Di et al. [42] proposed a Bayesian model-based method to predict host load. Khan et al. [94] used Hidden Markov Model (HMM) and achieved higher prediction accuracy. In [152], Yang et al. proposed a novel method for CPU host load prediction. The method uses phase space reconstruction (PSR) to extract features of CPU host load, and then uses the genetic algorithm to optimize parameters of the phase space reconstruction process. In a distributed computing system, the nature of CPU host load varies with different physical machines. Bey et al. [13, 15, 16] tried to distinguish different types of CPU host load using clustering method and then apply adaptive network based fuzzy inference system to each individual cluster of host load to predict CPU host load. Cao et al. [24] also addressed the problem of different host load types and applied dynamic ensemble model to adjust prediction parameters dynamically. A tendency based prediction method is proposed in [151] for one-step-ahead CPU host load prediction. Another tendency based long-term prediction method is proposed in [103].

Related documents