Model Accuracy - Microarchitecture-independent analytical branch behavior and multi-threaded pe

3.5 Results

3.5.2 Model Accuracy

We first show that linear branch entropy correlates well with branch misprediction rate, by evaluating the accuracy of the branch predictor model. Because the model is a linear function, model accuracy is a good indicator for correla- tion: if the model is accurate, branch entropy correlates well with misprediction rate, and vice versa. We evaluate the accuracy of the model for a few com- mon two-level predictors: a GAg predictor, a GAp predictor, a PAp predictor and a gshare predictor. Furthermore, we also evaluate a tournament predictor, consisting of a GAp and a PAp predictor, with a metapredictor (indexed with address bits only) to choose between the two predictors.

We evaluate the model using leave-one-out cross-validation: we train the model on all but one benchmark, and evaluate the accuracy for the left-out benchmark; and we repeat this process for all benchmarks as the left-out benchmark. We report the average difference in MPKI (misses per 1,000 instructions) between prediction and simulation. Using MPKI instead of misprediction rate avoids inflating numbers when there are few branches. Furthermore, MPKI is proportional to the branch miss CPI penalty of an application [19].

We train the model for each of the predictors using simulation results for all training benchmarks and for history sizes between 0 and 20 bits. For GAg, gshare and GAp, the global history entropy is used to fit the model to the misprediction rates; for the PAp and tournament predictors, we use local and tournament history entropy, respectively. For GAg, we calculate a single global entropy number across all branches (by combining all per-branch tables into one table), instead of an entropy number per branch. For gshare, we find that using per-branch entropy (as in the GAp predictor) leads to relatively large errors. By XOR-ing address bits and history, we loose some of the information of the address bits. We find that fitting the misprediction rates for gshare to the global entropy with a single entropy number across all branches (as for GAg) provides the best results. This can be explained by the fact that we use the same amount of history bits as for the GAg predictor to index the pattern history table (PHT), and that the XOR with the address bits partly solves the problem of GAg that the same history for different branches is mapped to the same entry (history aliasing). This aliasing reduction effect is now visible in a low (even negative) parameter α for the gshare model, which is a measure for aliasing, as we will discuss in the next section.

(a) Prediction error for the CBP 2011 benchmarks.

-12

-10

-8

-6

-4

-2

0

2

4

6

8

10 GAg

GAp

PAp

gshare

Tour

Error (MPKI)

Median Error Outliers Average Absolute Error

(b) Prediction error for the SPEC CPU 2006 benchmarks.

-12

-10

-8

-6

-4

-2

0

2

4

6

8

10 GAg

GAp

PAp

gshare

Tour

Error (MPKI)

Median Error Outliers Average Absolute Error

13.08 12.87

Figure 3.8: Prediction error as a boxplot, showing median and average absolute error.

3.5. RESULTS 43 The prediction error is shown as a box-and-whiskers plot3, see Figures 3.8a and 3.8b, and per benchmark, see Figures 3.9a and 3.9b, for the five branch predictors, on CPB and SPEC, respectively. These numbers are for one spe- cific configuration (i.e., number of history and address bits) for each predictor. These configurations have approximately the same hardware cost (4 K bytes at the second level). The error of the tournament predictor is the smallest for CBP with an average absolute error of 0.36 MPKI; for SPEC, the smallest error is observed for the GAp predictor with an average absolute error of 0.63 MPKI. For all modeled predictors, the average absolute error is around 0.70 MPKI and 0.89 MPKI for CBP and SPEC, respectively. The average MPKI for all predictors is 10.8, which means that the model has a relative error of less than 20%, and the errors are both negative and positive. The highest average errors are observed for the PAp predictor (0.87 MPKI and 1.14 MPKI for CBP and SPEC, respectively) and gshare (0.69 MPKI and 1.06 MPKI). This is because these predictors suffer from PHT aliasing, which, in contrast to pure branch address aliasing, is not modeled in our entropy calculation. The PAp predictor suffers from aliasing in its history table: we use 10 bits to index the history table, which means that it may happen that different branches update the same history, which pollutes this history. This effect is not modeled in our entropy calculation. Modeling this would require separate tables for every setting of the number of bits used to index the branch history table, which would incur too much overhead. The gshare predictor XORs history and address bits, which also leads to unpredictable aliasing effects (branches with a different history and instruction address may map to the same entry).

Although the models for CBP and SPEC have similar average errors, the SPEC benchmarks have more outliers. Some of the SPEC benchmarks show more irregular behavior than the CBP benchmarks. In particular, gcc has many unique branches, causing a lot of aliasing effects in the first level of the branch predictors. As a result, the branch misprediction rate is often predicted too low, leading to negative errors. The extreme points at the negative side in Figure 3.8b for the PAp, gshare and tournament predictor, are all for gcc with different input sets. There is also one outlier on the positive side, which is dealII for all predictors. We find that the branches in dealII show very fine-grained phase behavior: during a few thousand instructions, a particular branch is first taken multiple consecutive times, and then not-taken multiple times. Our entropy metric aggregates the outcomes per one million instructions, leading to a large entropy value for this branch (because it is about half of the time taken and half of the time not), but a predictor can accurately predict the sequences of equal outcomes, only missing when the outcome flips. Reducing the entropy measuring interval time would solve this problem, but incurs more overhead in the profiler, and also makes it impossible to detect long history patterns. Because we see this behavior in only one of the 95 evaluated benchmarks, we deem this additional overhead is not worth the gain in accuracy.

3_{The box covers the second and third quartile, with a line at the median. The whiskers}

cover all points within 1.5 interquartile distance outside of the box, and the crosses are the outliers. The circles represent the average of the absolute value of the error.

(a) Prediction error for the CBP 2011 benchmarks.

3.5. RESULTS 45

(b) Prediction error for the SPEC CPU 2006 benchmarks.

(a) CBP α β GAg -1.047 51.323 GAp -1.037 51.530 PAp 3.450 55.722 gshare -1.246 55.358 Tournament -0.455 56.010 (b) SPEC α β GAg -0.305 46.344 GAp -0.154 45.185 PAp -0.281 58.156 gshare -0.189 53.322 Tournament 0.140 52.522

Table 3.1: Model parameters for the different branch predictors (in % miss rate).

In document Microarchitecture-independent analytical branch behavior and multi-threaded performance modeling (Page 67-72)