This Webinar is provided to you by IEEE Computational Intelligence Society https://cis.ieee.org
Active Learning by Learning
2
About Me
Professor
National Taiwan University
Chief Data Science Consultant (former Chief Data Scientist)
Appier Inc.
Co-author Learning from Data
Instructor
2
Apple Recognition Problem
Note: Slide Taken from my “ML Techniques” MOOC
• needapple classifier: is this a picture of an apple?
• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning
(APAL stands for Apple and Pear Australia Ltd)
Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster
flic.kr/p/jNQ55 flic.kr/p/jzP1VB flic.kr/p/bdy2hZ flic.kr/p/51DKA8 flic.kr/p/9C3Ybd
nachans APAL Jo Jakeman APAL APAL
2
Apple Recognition Problem
Note: Slide Taken from my “ML Techniques” MOOC
• needapple classifier: is this a picture of an apple?
• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning
(APAL stands for Apple and Pear Australia Ltd)
Mr. Roboto. Richard North Richard North Emilian Robert Vicol Nathaniel McQueen flic.kr/p/i5BN85 flic.kr/p/bHhPkB flic.kr/p/d8tGou flic.kr/p/bpmGXW flic.kr/p/pZv1Mf
Crystal jfh686 skyseeker Janet Hudson Rennett Stowe
2
Active Learning: Learning by ‘Asking’
but labeling isexpensive
Protocol ⇔ Learning Philosophy • batch: ‘duck feeding’
• active: ‘question asking’(iteratively)
—query ynofchosenxn unknown target function
f : X → Y
labeled training examples
( , +1), ( , +1), ( , +1) ( , -1), ( , -1), ( , -1)
unlabeled training examples
( ), ( ), ( ), ( ) learning algorithm A final hypothesis g≈f +1
2
Pool-Based Active Learning Problem
Given
• labeled poolDl =n(featurexn ,label yn(e.g. IsApple?))
oN n=1 • unlabeled pool Du= n ˜ xs oS s=1
Goal: design an algorithm that iteratively
1 strategically querysome˜xs to get associatedy˜s
2 move (x˜s,y˜s)fromDutoDl
3 learnclassifier g(t)fromDl
and improvetest accuracy of g(t) w.r.t#queries
2
How to Query Strategically?
by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via Wikimedia Commons
Strategy 1
askmost confused
question
Strategy 2
askmost frequent
question
Strategy 3
askmost helpful
question
2
Choice of Strategy
Strategy 1: uncertainty
askmost confusedquestion
Strategy 2: representative
askmost frequentquestion
Strategy 3:
exp. error reduction
askmost helpfulquestion
• choosingone single strategy isnon-trivial:
0 10 20 30 40 50 60 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE 0 10 20 30 40 50 60 0.4 0.5 0.6 0.7 0.8 0.9 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE 0 10 20 30 40 50 60 0.5 0.6 0.7 0.8 0.9 1 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE
• human-designed strategyheuristicandconfinemachine’s ability can wefreethe machine
2
Our Contributions
a philosophical and algorithmic study of active learning, which ...
• allows machine to makeintelligent choice of
strategies, just like mycute kids
• studiessound feedback schemeto tell machine about goodness of choice, just likewhat I do
• results inpromising active learning performance, just like (hopefully) thebright futureof my kids
will describekey philosophical ideas
2
Idea: Trial-and-Reward Like Human
by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via
Wikimedia Commons
K strategies: A1, A2, · · · , AK
tryone strategy
“goodness” of strategy asreward
2
Reduction to Bandit
when do humanstrial-and-reward?
gambling
K strategies: A1, A2, · · · , AK
tryone strategy
“goodness” of strategy asreward
K bandit machines: B1, B2, · · · , BK
tryone bandit machine
“luckiness” of machine asreward
—will take one well-knownprobabilistic bandit learner (EXP4.P)
2
Active Learning by Learning
K strategies: A1, A2, · · · , AK
try one strategy
“goodness” of strategy as reward
Given: K existing active learning strategies for t = 1, 2, . . . , T
1 let EXP4.Pdecide strategy Ak to try
2 query the ˜xssuggested by Ak, and compute g(t)
3 evaluategoodness of g(t) asrewardoftrialto update EXP4.P
2
Ideal Reward
ideal rewardafter updating classifier g(t) by the query (x nt,ynt): accuracy 1 M M X m=1 r ym =g(t)(xm) z ontest set {(xm,ym)}Mm=1
• test accuracyasreward: area under query-accuracy curve≡cumulative reward
0 10 20 30 40 50 60 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE
• test accuracyinfeasiblein practice: labelingexpensive, remember?
2
Training Accuracy as Reward
test accuracy (((( (((( ((((( hhhhhh hhhhhh h 1 M PM
m=1qym=g(t)(xm)y infeasible, naïve replacement:
accuracy 1 t t X τ =1 r ynτ =g(t)(xnτ) z onlabeled pool {(xnτ,ynτ)}tτ =1
• training accuracyasreward:
training accuracy≈test accuracy?
• not necessarily!!
—for active learning strategy that askseasiestquestions:
• training accuracyhigh:xnτ easy to label
• test accuracylow: not enough information aboutharder instances
training accuracy:
2
Weighted Training Accuracy as Reward
training accuracy (((( (((( ((((( hhhhhh hhhhhh h 1 t Pt τ =1qynτ =g (t)(x nτ)y biased,
wantless-biased estimator
• non-uniform samplingtheorem: if(xnτ,ynτ)sampled with probability pτ >0from
data set {(xn,yn)}Nn=1in iteration τ , inexpectation
weightedtraining accuracy 1
t t X τ =1 1 pτJy nτ =g(xnτ)K ≈ 1 N N X n=1 Jyn=g(xn)K
• withprobabilistic querylike EXP4.P:
weightedtraining accuracy≈test accuracy
weightedtraining accuracy:
2
Active Learning by Learning (Hsu and Lin, 2015)
K strategies: A1, A2, · · · , AK
try one strategy
“goodness” of strategy as reward
Given: K existing active learning strategies for t = 1, 2, . . . , T
1 let EXP4.Pdecide strategy Ak to try
2 query the ˜xssuggested by Ak, and compute g(t)
3 evaluateweightedtraining accuracy of g(t) asrewardoftrialto update EXP4.P
2
Human-Designed Criterion as Reward
(Baram et al., 2004) COMB approach:
bandit +balancednessof g(t) on unlabeled data as reward
• why? human criterion that matches classifier todomain assumption • but many active learning applications are onunbalanced data!
—assumption may beunrealistic
2
Comparison with Single Strategies
UNCERTAINBest 5 10 15 20 25 30 35 40 45 50 55 60 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE vehicle PSDSBest 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE sonar QUIREBest 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE diabetes
• no single best strategyfor every data set: choosing needed
• ALBLconsistentlymatches the best: similar findings across other data sets
2
Comparison with Other Bandit-Driven Algorithms
ALBL≈COMB 5 10 15 20 25 30 35 40 45 50 55 60 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 % of unlabelled data Accuracy ALBL COMB ALBL−Train diabetes ALBL>COMB 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy ALBL COMB ALBL−Train sonar
• ALBL>ALBL-Traingenerally
—importance-weightedmechanism needed for correctingbiased training accuracy • ALBLconsistentlycomparable to or better thanCOMB
—learning performancemore useful thanhuman-criterion
2
Summary
Active Learning by Learning
• based onbandit learning+less biased performance estimatoras reward
• effective inmaking intelligent choices
—comparable or superior to the best of existing strategies
• effective inutilizing learning performance
—superior to human-criterion-based approach
open-source tool developed: https://github.com/ntucllab/libact
2
Discussion for Theoreticians
weightedtraining accuracy 1t Pt
τ =1
1
pτ qynτ =g
(t)(x
nτ)y as reward
• is rewardunbiased estimatorof test performance?
no for learned g(t) (yes for fixed g)
• is reward fixedbefore playing?
no because g(t)learned from (xnt,ynt)
• is rewardindependent of each other?
no because past history all in reward
ALBL: tools from theoreticians
2
Have We Made Active Learning More Realistic? (1/2)
Yes!
open-source toollibactdeveloped (Yang, 2017)
https://github.com/ntucllab/libact
• including uncertainty, QUIRE, PSDS, . . .,and ALBL
• received>500starsand continuousissues
“libact is a Python package designed tomake
2
Have We Made Active Learning More Realistic? (2/2)
No! (Not Completely!)
• single-most raisedissue: hard to install on Windows/Mac —because several strategies requires some C packages • performance in a recent industry project:
—uncertaintysamplingoften suffices; ALBLdragged down by bad strategy
“libact is a Python package designed to make
2
Other Attempts for Realistic Active Learning
“learn” a strategybeforehandrather than on-the-fly? • transfer active learning experience(Chu and Lin, 2016)
—not easy to realizein open-source package
other active learning tasks? • NLP(Yuan et al., 2020)
• reinforcement learning(Chen et al., 2020) • annotation cost-sensitive(Tsou and Lin, 2019) • classification cost-sensitive(Huang and Lin, 2016)
2
Lessons Learned from Research on Active Learning by Learning
by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via Wikimedia Commons
1 scalability bottleneckof artificial intelligence:
choiceof methods/models/parameter/. . .
2 think outside of themathbox:
‘non-rigorous’ usage may begood enough
3 easy-to-use in design 6=easy-to-use in reality