Intelligent Heuristic Construction with Active Learning

(1)

William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather

Intelligent Heuristic Construction with Active Learning

TH E U N I V E R_S I T Y O F E D I N B U R G H

(2)

Space is BIG!

• Hubble “Ultra-Deep Field” • Tiny region of space shown • Despite this, many galaxies • Each galaxy, billions of stars • Relevance to heuristics…?

(3)

Optimisation spaces are MUCH BIGGER!!!

• We can’t pick from 10400 • Rough heuristics instead • Traditionally hard-coded

• Can take a year to “perfect”

• As if that wasn't bad enough…

1082 Atoms in the Universe 10400_{Combinations of} GCC Optimisations

(4)

… the problem is even worse than that!

• Each architectural change requires heuristics to be re-tuned • Heuristics are inherently tied to the underlying hardware

• Most compilers support many different platforms • Very difficult to keep up and getting harder

(5)

Machine Learning to the rescue?

• Leverage machine learning techniques to create heuristics • Well suited to the problem

• Lots of interesting research • Can be better than Humans

• But, it’s also incredibly slow to learn

• We demonstrate how it’s possible to accelerate training • Create a heuristic which maps workload to processor

(6)

Quick Detour: Machine Learning 101

• Classification involves forming a correlation between

the features of an object and its label

fe a tu re v a lu e s examples

best heuristic value

Machine Learning

(7)

Training a Heuristic

thousands of examples inp u t v a lu e 2 input value 1

(8)

Training a Heuristic

Machine Learning Algorithm thousands of examples inp u t v a lu e 2 input value 1

(9)

Training a Heuristic

Machine Learning Algorithm thousands of examples inp u t v a lu e 2 input value 1 mathematical model CPU GPU

(10)

Using a Heuristic

inp u t v a lu e 2 input value 1 CPU GPU Mathematical Model unseen features predicted processor

(11)

So what’s wrong with this?

fea tu re 2 feature 1

(12)

Well, we actually only needed these!

(13)

So this was a complete waste of time!

(14)

How much time was wasted?

• Correctness of labels are tied to heuristic quality

• I.e. consistently wrong labels leads to wrong model • Sound data is essential, but very expensive

• E.g. are inputs X, Y, Z faster on CPU or GPU?

1. Run program on CPU using X, Y, Z 2. Run program on GPU using X, Y, Z

(15)

Compile-time Heuristics are Even Slower

• Labelling one single example requires iterative compilation • compile code using different optimisation values

• repeated profiling to make statistically sound determination • only then, associate best optimisation with code features

.c .exe .exe .exe best optimisation wins

(16)

What do we do about it?

• We cannot know where the informative examples lie • But, we can let the algorithm make an educated guess • You and I do not learn in a random, unstructured way • We build up our knowledge gradually and iteratively • Perhaps, let the algorithm do the same…?

(17)

Active Supervised Learning

Machine Learning Algorithm thousands of random examples final model passive (random)

(18)

Active Supervised Learning

Machine Learning Algorithm thousands of random examples final model

passive (random) active (iterative)

few random examples

intermediate model ML Algorithm

(19)

Active Supervised Learning

Machine Learning Algorithm thousands of random examples final model

passive (random) active (iterative)

completion reached?

final model

intermediate model carefully

select an example no

yes ML Algorithm

(20)

How do we know when it’s complete?

final model

• Many criteria, including • time elapsed

• loop iterations • cross-validation

(21)

What about selecting examples?

final model

• Many algorithms available

• Used “Query by Committee” • Easier to show than to tell

(22)

We start with a few random examples

(23)

We form multiple intermediate models

(24)

Each with a distinct algorithm

(25)

A “committee” of different models

(26)

Here the committee disagrees, but

we use this to our advantage

fea

tu

re

2

feature 1

• Disagreement regions hold the greatest potential to

(27)

So what example do we learn from next?

• We ask each model to predict the label of random

unseen examples drawn from the feature space

fea

tu

re

2

(28)

Broadly the “Committee” will agree…

(29)

… but we’re interested in disagreement!

(30)

We select one of these examples

to label properly

(31)

Then rebuild the intermediate models

• Notice the region of disagreement has shrunk • Eventually the distinct models will converge

(32)

Experimental Setup

• Demonstrate technique by creating an important heuristic • Map workload to fastest device — CPU or GPU

• Much studied problem, choosing poorly can drastically

degrade performance

• Specifically, given inputs for Rodinia HotSpot, PathFinder,

SRAD and Matrix Multiplication is it faster to use OpenMP (CPU) or OpenCL (GPU)?

• Compared number of training examples required to get

(33)

A few gory details — most in the paper

• Measured accuracy of randomly-trained vs.

QBC-trained classifier using 500 test examples

• Intel Core i7 7770 @ 3.4GHz (8 HW Threads) • NVIDIA Geforce GTX Titan (6GB)

• 12 distinct committee members • 1 random example to begin

• 10,000 candidate examples • 200 loop iterations

(34)

Random Training Examples

0 10 20 30 40 50 60 70 80 90 100 110 120 130 0 20 40 60 80 100 120

Program Input Parameter

Pr og ram Inp ut Par am eter

(35)

QBC Chosen Training Examples

0 10 20 30 40 50 60 70 80 90 100 110 120 130 0 20 40 60 80 100 120

CPU GPU Sample Points

P ro g ra m In p u tP a ra m e te r

Program Input Parameter

(36)

Lights, Camera, Action...

• Shows “ib1” algorithm refining a HotSpot model over

time, using training examples chosen by a committee

Region of Disagreement over time

Shape of Model over time

(37)

(38)

Summary

• Desperately need fast, reliable method to generate heuristics • Current implementations rely on learning randomly

• Randomness is problematic because of labelling costs • We show active learning is much more efficient

• 3x faster at creating heuristics to map program inputs to