William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather
Intelligent Heuristic Construction with Active Learning
TH E U N I V E RS I T Y O F E D I N B U R G H
Space is BIG!
• Hubble “Ultra-Deep Field” • Tiny region of space shown • Despite this, many galaxies • Each galaxy, billions of stars • Relevance to heuristics…?
Optimisation spaces are MUCH BIGGER!!!
• We can’t pick from 10400 • Rough heuristics instead • Traditionally hard-coded
• Can take a year to “perfect”
• As if that wasn't bad enough…
1082 Atoms in the Universe 10400 Combinations of GCC Optimisations
… the problem is even worse than that!
• Each architectural change requires heuristics to be re-tuned • Heuristics are inherently tied to the underlying hardware
• Most compilers support many different platforms • Very difficult to keep up and getting harder
Machine Learning to the rescue?
• Leverage machine learning techniques to create heuristics • Well suited to the problem
• Lots of interesting research • Can be better than Humans
• But, it’s also incredibly slow to learn
• We demonstrate how it’s possible to accelerate training • Create a heuristic which maps workload to processor
Quick Detour: Machine Learning 101
• Classification involves forming a correlation between
the features of an object and its label
fe a tu re v a lu e s examples
best heuristic value
Machine Learning
Training a Heuristic
thousands of examples inp u t v a lu e 2 input value 1Training a Heuristic
Machine Learning Algorithm thousands of examples inp u t v a lu e 2 input value 1Training a Heuristic
Machine Learning Algorithm thousands of examples inp u t v a lu e 2 input value 1 mathematical model CPU GPUUsing a Heuristic
inp u t v a lu e 2 input value 1 CPU GPU Mathematical Model unseen features predicted processorSo what’s wrong with this?
fea tu re 2 feature 1Well, we actually only needed these!
fea tu re 2 feature 1So this was a complete waste of time!
fea tu re 2 feature 1How much time was wasted?
• Correctness of labels are tied to heuristic quality
• I.e. consistently wrong labels leads to wrong model • Sound data is essential, but very expensive
• E.g. are inputs X, Y, Z faster on CPU or GPU?
1. Run program on CPU using X, Y, Z 2. Run program on GPU using X, Y, Z
Compile-time Heuristics are Even Slower
• Labelling one single example requires iterative compilation • compile code using different optimisation values
• repeated profiling to make statistically sound determination • only then, associate best optimisation with code features
.c .exe .exe .exe best optimisation wins
What do we do about it?
• We cannot know where the informative examples lie • But, we can let the algorithm make an educated guess • You and I do not learn in a random, unstructured way • We build up our knowledge gradually and iteratively • Perhaps, let the algorithm do the same…?
Active Supervised Learning
Machine Learning Algorithm thousands of random examples final model passive (random)Active Supervised Learning
Machine Learning Algorithm thousands of random examples final modelpassive (random) active (iterative)
few random examples
intermediate model ML Algorithm
Active Supervised Learning
Machine Learning Algorithm thousands of random examples final modelpassive (random) active (iterative)
few random examples
completion reached?
final model
intermediate model carefully
select an example no
yes ML Algorithm
How do we know when it’s complete?
few random examples
completion reached?
final model
intermediate model carefully
select an example no
yes ML Algorithm
• Many criteria, including • time elapsed
• loop iterations • cross-validation
What about selecting examples?
few random examples
completion reached?
final model
intermediate model carefully
select an example no
yes ML Algorithm
• Many algorithms available
• Used “Query by Committee” • Easier to show than to tell
We start with a few random examples
fea tu re 2 feature 1We form multiple intermediate models
fea tu re 2 feature 1Each with a distinct algorithm
fea tu re 2 feature 1A “committee” of different models
fea tu re 2 feature 1Here the committee disagrees, but
we use this to our advantage
fea
tu
re
2
feature 1
• Disagreement regions hold the greatest potential to
So what example do we learn from next?
• We ask each model to predict the label of random
unseen examples drawn from the feature space
fea
tu
re
2
Broadly the “Committee” will agree…
fea tu re 2 feature 1… but we’re interested in disagreement!
fea tu re 2 feature 1We select one of these examples
to label properly
fea tu re 2 feature 1Then rebuild the intermediate models
fea tu re 2 feature 1• Notice the region of disagreement has shrunk • Eventually the distinct models will converge
Experimental Setup
• Demonstrate technique by creating an important heuristic • Map workload to fastest device — CPU or GPU
• Much studied problem, choosing poorly can drastically
degrade performance
• Specifically, given inputs for Rodinia HotSpot, PathFinder,
SRAD and Matrix Multiplication is it faster to use OpenMP (CPU) or OpenCL (GPU)?
• Compared number of training examples required to get
A few gory details — most in the paper
• Measured accuracy of randomly-trained vs.
QBC-trained classifier using 500 test examples
• Intel Core i7 7770 @ 3.4GHz (8 HW Threads) • NVIDIA Geforce GTX Titan (6GB)
• 12 distinct committee members • 1 random example to begin
• 10,000 candidate examples • 200 loop iterations
Random Training Examples
0 10 20 30 40 50 60 70 80 90 100 110 120 130 0 20 40 60 80 100 120Program Input Parameter
Pr og ram Inp ut Par am eter
QBC Chosen Training Examples
0 10 20 30 40 50 60 70 80 90 100 110 120 130 0 20 40 60 80 100 120CPU GPU Sample Points
P ro g ra m In p u tP a ra m e te r
Program Input Parameter
Lights, Camera, Action...
• Shows “ib1” algorithm refining a HotSpot model over
time, using training examples chosen by a committee
Region of Disagreement over time
Shape of Model over time
Summary
• Desperately need fast, reliable method to generate heuristics • Current implementations rely on learning randomly
• Randomness is problematic because of labelling costs • We show active learning is much more efficient
• 3x faster at creating heuristics to map program inputs to