114
CHAPTER 8
A RELEARNING VIRTUAL MACHINE
The effect of learning and relearning on the prediction of hot methods is also implemented in the LLVM. The two models, one for predicting the FCHM and another for the LRHM, are constructed using one set of benchmark programs. The hot methods in a new program from another benchmark program are predicted by the predictive model for optimization. The machine learning based model immediately relearns, with the first set of benchmark programs along with the second one, and a new predictive model is constructed. After predicting the hot methods of a program, the system relearns with a new program. Thus, the predictive model keeps on relearning, and constructs new predictive models.
8.1 MACHINE RELEARNING
Figure 8.1 shows the system overview of the learning and relearning hot method predictive system. The relearning hot method predictive models for the frequently called and the long running hot methods are constructed using the SVM-based model. These predictive models are used to predict the hot methods in the programs, which are optimized before execution. The feedback through profiling and ‘gprof’ tools obtained from the execution of the program is used in evaluating the prediction accuracy of the models. The relearning system also uses the feedback to the predictive models. Thus, after the prediction, optimization and execution of every new program, the predictive model is reconstructed by the relearning system.
115
The predictive model is trained with all the programs in a benchmark suite. That is, the feature vector set is constructed using one full benchmark suite. A total of ten features are used in developing the FCHM predictive model, whereas twenty nine features are used for the LRHM predictive model. These feature sets represent the effective feature sets constructed in Chapter 5, using the sequential backward elimination process coupled with a ‘knock-out’ algorithm.
Once trained, both the predictive models are used in predicting the hot methods in a new benchmark suite. This new prediction experience is compared with the actual hot methods during execution to evaluate the HMPA. The system now starts its relearning process and constructs a new training set by appending the training feature vector constructed from the new benchmark program. The new training data set is then used to construct two new predictive models, one for the FCHM and another for the LRHM. The system unlearns and relearns in the process of making a new predictive model. The new predictive model could be used for predicting any new benchmark program. Thus, the model learns from every new program that enters the system for execution.
This work is the first attempt to apply relearning in virtual machines. The limitations of the system are:
i) An offline relearning happens throughout the lifetime of the system and this makes it difficult for real time and online systems to use the system.
ii) The system unlearns and then relearns, instead of incrementally learning or updating the predictive model. This kind of relearning after unlearning consumes time, but it can be executed as a background process in an online system.
116
8.2 RELEARNING VIRTUAL MACHINE ARCHITECTURE
The ten and twenty nine static features that are used in the construction of the respective predictive models for the FCHM and the LRHM are collected from each method by an offline static analysis of the LLVM’s bytecode. These features form the feature vectors that are accumulated in the training data set file. The feature vector is labeled ‘+1’ for the hot methods and ‘-1’ for the cold methods. Profiling and ‘gprof’ tool are used respectively, for the identification of the FCHM and LRHM.
117
8.3 PERFORMANCE EVALUATION
The effects of learning and relearning of the virtual machine are evaluated on six different combinations of three benchmark suites, namely, the SPEC, UTDSP and Mediabench. The predictive model constructed from one benchmark suite predicts the hot methods of the other two benchmark suites. The prediction accuracies obtained are the outcome of the initial learning on the first benchmark suite.
Next, the predictive models are subjected to a relearning process. The relearning system is evaluated by using various combinations of the three benchmark suites. After the initial learning of the predictive models by one of the benchmark suites, programs from another benchmark suite are used as testing programs for hot method prediction. The predicted hot methods are optimized and then executed. The prediction results are compared with the actual hot methods generated by a profiler for the FCHM and the ‘gprof’ tool for the LRHM. Next, the test benchmark programs are added to the existing set of training benchmark programs and the predictive models are retrained. Thus new predictive models are constructed, with the old and the new benchmark programs. When a new program enters the system for execution, the existing predictive model predicts the hot methods in the program, and after execution, the predictive models are reconstructed using the additional information obtained from the new program. The training and evaluation methodology adopted in relearning, trains the predictive model in one benchmark suite and tests it using another benchmark suite, which is an improvement over the LOCV strategy within a benchmark suite.
118
18
Table 8.1 HMPA and its Improvement on Relearning with Various Benchmark Combination Sequences for the FCHM Predictive Model Benchmark for Initial Learning HMPA without ReLearning
Relearning Relearning Overall
Benchmark HMPA % Benchmark 1 HMPA % Improvement % Benchmark 2 HMPA% Improvement % HMPA % Improvement %
UTDSP SPEC 6 SPEC 0 -6 Mediabench 0 0 0 -3
Mediabench 0 Mediabench 0 0 SPEC 11 5 6 3
SPEC UTDSP 2 UTDSP 20 18 Mediabench 0 0 10 9
Mediabench 0 Mediabench 0 0 UTDSP 25 23 13 12
Mediabench SPEC 19 SPEC 10 -9 UTDSP 23 23 17 7
119
19
Table 8.2 HMPA and its Improvement on Relearning with Various Benchmark Combination Sequences for the LRHM Predictive Model Benchmark for Initial Learning HMPA without ReLearning
Relearning Relearning Overall
Benchmark HMPA % Benchmark 1 HMPA % Improvement % Benchmark 2 HMPA % Improvement % HMPA % Improvement %
UTDSP SPEC 48 SPEC 31 -17 Mediabench 31 -7 31 -12
Mediabench 38 Mediabench 17 -21 SPEC 1 -47 9 -34
SPEC UTDSP 32 UTDSP 63 31 Mediabench 11 11 37 21
Mediabench 0 Mediabench 11 11 UTDSP 0 -32 6 -11
Mediabench SPEC 5 SPEC 31 26 UTDSP 31 28 31 27
120
8.4 EXPERIMENTAL RESULTS
Figures 8.2, 8.3 and 8.4 show the LRHM prediction accuracies obtained for the individual programs of the benchmark suites using the initial training prior to relearning. For instance Figure 8.2 represents the HMPA obtained on the SPEC benchmark using the predictive models trained by either the UTDSP or the Mediabench. The observations of initial learning are presented as the HMPA without relearning, in Table 8.2.
48 5 0 20 40 60 80 H M P A% SPEC Trained by UTDSP Trained by MediaBench
Figure 8.2 Hot Method Predictions for the LRHM on the SPEC Benchmark Suite by Models Trained by the UTDSP and Mediabench
Using the FCHM predictive model, the prediction accuracies obtained are small, and the observations of the initial model are recorded in Table 8.1 as the HMPA% without relearning.
121 32 3 0 20 40 60 80 H M P A% UTDSP Trained by SPEC Trained by MediaBench
Figure 8.3 Hot Method Predictions for the LRHM on the UTDSP Benchmark Suite by Models Trained by the SPEC and Mediabench 38 1 0 10 20 30 40 50
h264dec/ldecod h264dec/lencod h264enc/ldecod h264enc/lencod Average
H M P A% Mediabench Trained by UTDSP Trained by SPEC
Figure 8.4 Hot Method Predictions for the LRHM on the Mediabench Benchmark Suite by Models Trained by the SPEC and UTDSP
122
8.4.1 Relearning of the FCHM Predictive Model
Table 8.1 presents the data on the performance of the relearning predictive models for the FCHM. It is seen that with the models for the FCHM, the prediction accuracy is low, before and after relearning, compared with the prediction accuracies of 68% and 16%, as presented in Tables 5.2 and 5.4, obtained when the UTDSP and SPEC benchmark programs are evaluated using the model derived using the ‘knock-out’ strategy. However, the relearning system shows a consistent prediction with five out of six benchmark combinations indicating an overall improvement. From Table 8.1, it is seen that when individual predictive models are considered, the model built by Mediabench can predict the SPEC with 19% HMPA and the UTDSP with 0%. The system when relearnt, first by using the UTDSP and later by the SPEC benchmark suites can achieve 26% and 12% respectively. It is an overall 10% improvement over the prediction prior to relearning.
8.4.2 Relearning of the LRHM Predictive Model
Table 8.2 presents a similar data on the relearning experience of the LRHM predictive model. It is seen that the highest LRHM prediction accuracies of 48% and 38% are obtained respectively on the SPEC and Mediabench benchmark suites, when the model is initially learnt by the UTDSP. Even though the same model predicts the LRHM with a 31% prediction accuracy after relearning with the SPEC as the first benchmark and Mediabench as the second, the system’s overall prediction accuracy decreases by 17%and 7% respectively, for the SPEC and Mediabench benchmarks. The LRHM predictive model can achieve a maximum HMPA of 48% and 32% prior to relearning for the UTDSP and SPEC benchmark programs, while 86% and 48% prediction accuracy has been achieved on the same benchmarks with the model derived using the ‘knock-out’ strategy. In this approach, the
123
predictive model is trained in one benchmark, and tested on a different benchmark suite.
The highlight of the performance of the predictive model for the LRHM as shown in Table 8.2, is a model initially built by the SPEC, which can obtain a prediction accuracy of 32% on the UTDSP and 0% on the Mediabench, prior to relearning. With the first relearning experience on the UTDSP, it can achieve a HMPA of 63% and with subsequent relearning on the Mediabench, it gives 11%, leading to an overall HMPA of 37%. It is a 21% improvement over the system without relearning. The model that is initially trained with Mediabench, and later relearnt in the SPEC-UTDSP order and the UTDSP-SPEC order, has achieved the highest overall improvement of 27% and 29% respectively; the SPEC trained model, when relearnt in the UTDSP-Mediabench order gives the highest and the most consistent HMPA%.
Unlike the high performing LRHM predictive model, which invariably yields better prediction accuracies, the predictions of the FCHM model are small. Nevertheless, the FCHM predictive model also performs fairly well on relearning. It may be concluded that the hot method prediction keeps improving with machine relearning.
8.5 CONCLUSION
A relearning virtual machine is constructed, which learns every time a new program enters the system for execution. The HMPA percentages obtained with some of the relearnt predictive models are very impressive, indicating clearly that the models could achieve a reasonable improvement of 10% and 21% respectively, for the frequently called and the long running hot methods, when relearnt using different combinations of the SPEC, UTDSP and Mediabench benchmarks, over systems without relearning. These results
124
confirm the effective predictability of hot methods over the machine learning based models. The online systems can do the relearning as a background process.
The next chapter concludes the work with a summary of the contribution made by the machine learning based hot method predictions in selective compiler optimization.