• No results found

4.3 Related Work

4.4.7 Conclusions

We demonstrated that it can sometimes be more effective to devote resources to learn- ing the smart thing to do, than to simple throw resources at a potentially suboptimal configuration. Our technique devotes half of the system resources to trying something different, to enable online adaption to the system environment. The geometric mean speedup of SiblingRivalry was 1.8x after a migration between microarchitectures. Even in comparison to an offline-optimized version on the same microarchitecture that uses the full resources, SiblingRivalry showed a geometric mean performance increase of 1.3x when moderate load was introduced on the machine. SiblingRivalry, while performing close to twice the amount of work, consumed on average 30% less power compared to running a well tuned algorithm after a migration. These results show that continuously adapting the program to the environment can provide a huge boost in performance that easily overcame the cost of splitting the available resources in half.

In addition, we have showed that an intelligent machine learning system can rapidly find a good solution even when the search space is extremely large. Fur- thermore, we demonstrated that it is important to provide many algorithmic and optimization choices to the online learner as done by the PetaBricks language and compiler. While these choices increase the search space, they make it possible for the autotuner to obtain the performance gains observed.

SiblingRivalry is able to fully eliminate the offline learning step, making the pro- cess fully transparent to users, which is the biggest impediment to the acceptance of autotuning. For example, while Feedback Directed Optimization (FDO) can provide substantial performance gains, the extra step involved in the programmers workflow has stopped this promising technique from being widely adopted [23]. By elimi- nating any extra steps, we believe that SiblingRivalry can bring autotuning to the mainstream program optimization. As we keep increasing the core counts of our processors, autotuning via SiblingRivalry help exploit them in a purposeful way.

Chapter 5

Hyperparameter Tuning

The behavior and efficacy of SiblingRivalry is intimately tied to the selection rule in Equation 4.1, which in turn relies on the hyperparameters C and W . It is likely that different values of these hyperparameters can have a significant impact on the quality of autotuned programs, and it is therefore important that we use the appropriate hyperparameters for each tuned program. In the previous chapter, we delegated the problem of selecting such appropriate hyperparameter values to the user, without investigating how difficult they are to find or what impact they have on autotuning. In this chapter, we provide such analysis, which proceeds as follows. In Section 5.1 we discuss the difficulties of selecting optimal hyperparameter values. In Section 5.2, we describe the evaluation metrics we use to formally assess the quality of a given set of a hyperparameter values. Finally, Section 5.3 provides experimental evaluation of SiblingRivalry’s hyperparameter robustness.

5.1

Tuning the Tuner

The hyperparameters C (exploration/exploitation trade-off) and W (window size) can have a significant impact on the efficacy of SiblingRivalry. For example, if C is set too high, it might dominate the exploitation term and all operators will be applied approximately uniformly, regardless of their past performance. If, on the other hand, C is set too low, it will be dominated by the exploitation term AU Ci,t and new,

possibly better operators will rarely be applied in favor of operators which made only marginal improvements in the past.

The problem is further complicated by the fact that the relative magnitude of the exploration and exploitation terms is highly problem-dependent [29]. For example, programs with a lot of algorithmic choices are likely to benefit from a relatively high exploration rate. This is because algorithmic changes create discontinuities in the program’s fitness, and operator scores calculated for a given set of algorithms will not be accurate when those algorithms suddenly change. When such changes occur, exploration should become the dominant behavior. For other programs, e.g. those where only a few mutators improve performance, sacrificing exploration in favor of exploitation might be optimal. This is especially true for programs with few algo- rithmic choices - once the optimal algorithmic choices have been made, the autotuner should focus on adjusting cutoffs and tunables using an exploitative strategy with a comparatively low C.

The optimal value of C is also closely tied to the optimal value of W , which controls the size of the history window. The autotuner looks at operator applications in the past W races, and uses the outcome of those applications to assign a quality score to each operator. This is based on the assumption that an operator’s past performance is a predictor of its future performance, which may not always be true. For example, changes in algorithms can create discontinuities in the fitness landscape, making past operator performance largely irrelevant. However, if W is large, this past performance will still be taken into account for quite some time. In such situations, a small W might be preferred.

Furthermore, optimal values of C and W are not independent. Due to the way AU Ci,t is computed, the value of the exploitation term grows with W (see Section

4.2.5). Thus by changing W , which superficially controls only the size of the history window, one might accidentally alter the exploration/exploitation balance. For this reason, C and W should be tuned together.

Finally, the task of selecting hyperparameters is complicated by the fact that different hyperparameter values might be optimal at different stages of the autotun-

ing process. As described earlier, a larger C might be favorable following algorithm changes, with a smaller C when optimal algorithmic choices have already been made. Currently, however, SiblingRivalry does not allow dynamically adjusting hyperpa- rameters throughout the run, which have to be statically set before the autotuning begins.

Related documents