Active Learning by Learning

(1)

This Webinar is provided to you by IEEE Computational Intelligence Society https://cis.ieee.org

Active Learning by Learning

(2)

2

About Me

Professor

National Taiwan University

Chief Data Science Consultant (former Chief Data Scientist)

Appier Inc.

Co-author Learning from Data

Instructor

(3)

2

Apple Recognition Problem

Note: Slide Taken from my “ML Techniques” MOOC

• needapple classifier: is this a picture of an apple?

• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning

(APAL stands for Apple and Pear Australia Ltd)

Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster

flic.kr/p/jNQ55 flic.kr/p/jzP1VB flic.kr/p/bdy2hZ flic.kr/p/51DKA8 flic.kr/p/9C3Ybd

nachans APAL Jo Jakeman APAL APAL

(4)

2

Apple Recognition Problem

Note: Slide Taken from my “ML Techniques” MOOC

• needapple classifier: is this a picture of an apple?

• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning

(APAL stands for Apple and Pear Australia Ltd)

Mr. Roboto. Richard North Richard North Emilian Robert Vicol Nathaniel McQueen flic.kr/p/i5BN85 flic.kr/p/bHhPkB flic.kr/p/d8tGou flic.kr/p/bpmGXW flic.kr/p/pZv1Mf

Crystal jfh686 skyseeker Janet Hudson Rennett Stowe

(5)

2

Active Learning: Learning by ‘Asking’

but labeling isexpensive

Protocol ⇔ Learning Philosophy • _{batch: ‘duck feeding’}

• active: ‘question asking’(iteratively)

—query ynofchosenxn unknown target function

f : X → Y

labeled training examples

( , +1), ( , +1), ( , +1) ( , -1), ( , -1), ( , -1)

unlabeled training examples

( ), ( ), ( ), ( ) learning algorithm A final hypothesis g≈f +1

(6)

2

Pool-Based Active Learning Problem

Given

• labeled poolD_l =n(featurexn ,label yn(e.g. IsApple?))

oN n=1 • unlabeled pool Du= n ˜ xs oS s=1

Goal: design an algorithm that iteratively

1 strategically querysome˜x_s to get associatedy˜_s

2 move (x˜s,y˜s)fromDutoDl

3 learnclassifier g(t)fromD_l

and improvetest accuracy of g(t) w.r.t#queries

(7)

2

How to Query Strategically?

by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via Wikimedia Commons

Strategy 1

askmost confused

question

Strategy 2

askmost frequent

question

Strategy 3

askmost helpful

question

(8)

2

Choice of Strategy

Strategy 1: uncertainty

askmost confusedquestion

Strategy 2: representative

askmost frequentquestion

Strategy 3:

exp. error reduction

askmost helpfulquestion

• _choosing_{one single strategy is}_non-trivial_:

0 10 20 30 40 50 60 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE 0 10 20 30 40 50 60 0.4 0.5 0.6 0.7 0.8 0.9 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE 0 10 20 30 40 50 60 0.5 0.6 0.7 0.8 0.9 1 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE

• human-designed strategyheuristicandconfinemachine’s ability can wefreethe machine

(9)

2

Our Contributions

a philosophical and algorithmic study of active learning, which ...

• _{allows machine to make}_{intelligent choice of}

strategies, just like mycute kids

• studiessound feedback schemeto tell machine about goodness of choice, just likewhat I do

• results inpromising active learning performance, just like (hopefully) thebright futureof my kids

will describekey philosophical ideas

(10)

2

Idea: Trial-and-Reward Like Human

by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via

Wikimedia Commons

K strategies: A1, A2, · · · , AK

tryone strategy

“goodness” of strategy asreward

(11)

2

Reduction to Bandit

when do humanstrial-and-reward?

gambling

K strategies: A1, A2, · · · , AK

tryone strategy

“goodness” of strategy asreward

K bandit machines: B1, B2, · · · , BK

tryone bandit machine

“luckiness” of machine asreward

—will take one well-knownprobabilistic bandit learner (EXP4.P)

(12)

2

Active Learning by Learning

K strategies: A₁, A2, · · · , AK

try one strategy

“goodness” of strategy as reward

Given: K existing active learning strategies for t = 1, 2, . . . , T

1 let EXP4.Pdecide strategy A_k to try

2 query the ˜x_ssuggested by A_k, and compute g(t)

3 evaluategoodness of g(t) asrewardoftrialto update EXP4.P

(13)

2

Ideal Reward

ideal rewardafter updating classifier g(t) _{by the query (}_x nt,ynt): accuracy 1 M M X m=1 r ym =g(t)(xm) z ontest set {(xm,ym)}Mm=1

• test accuracyasreward: area under query-accuracy curve≡cumulative reward

0 10 20 30 40 50 60 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE

• _{test accuracy}_infeasible_{in practice: labeling}_expensive_{, remember?}

(14)

2

Training Accuracy as Reward

test accuracy (((( (((( ((((( hhh_hhh hhh_hhh h 1 M PM

m=1qym=g(t)(xm)y infeasible, naïve replacement:

accuracy 1 t t X τ =1 r ynτ =g(t)(xnτ) z onlabeled pool {(xnτ,ynτ)}tτ =1

• training accuracyasreward:

training accuracy≈test accuracy?

• _{not necessarily!!}

—for active learning strategy that askseasiestquestions:

• training accuracyhigh:xnτ easy to label

• test accuracylow: not enough information aboutharder instances

training accuracy:

(15)

2

Weighted Training Accuracy as Reward

training accuracy (((( (((( ((((( hhh_hhh hhh_hhh h 1 t Pt τ =1qynτ =g (t)₍_x nτ)y biased,

wantless-biased estimator

• non-uniform samplingtheorem: if(xnτ,ynτ)sampled with probability pτ >0from

data set {(xn,yn)}N_n=1in iteration τ , inexpectation

weightedtraining accuracy 1

t t X τ =1 1 pτJy nτ =g(xnτ)K ≈ 1 N N X n=1 Jyn=g(xn)K

• withprobabilistic querylike EXP4.P:

weightedtraining accuracy≈test accuracy

weightedtraining accuracy:

(16)

2

Active Learning by Learning (Hsu and Lin, 2015)

K strategies: A₁, A2, · · · , AK

try one strategy

“goodness” of strategy as reward

Given: K existing active learning strategies for t = 1, 2, . . . , T

1 let EXP4.Pdecide strategy A_k to try

2 query the ˜x_ssuggested by A_k, and compute g(t)

3 evaluateweightedtraining accuracy of g(t) asrewardoftrialto update EXP4.P

(17)

2

Human-Designed Criterion as Reward

(Baram et al., 2004) COMB approach:

bandit +balancednessof g(t) on unlabeled data as reward

• _{why? human criterion that matches classifier to}_{domain assumption} • _{but many active learning applications are on}unbalanced data!

—assumption may beunrealistic

(18)

2

Comparison with Single Strategies

UNCERTAINBest 5 10 15 20 25 30 35 40 45 50 55 60 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE vehicle PSDSBest 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE sonar QUIREBest 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE diabetes

• _{no single best strategy}_{for every data set: choosing needed}

• ALBLconsistentlymatches the best: similar findings across other data sets

(19)

2

Comparison with Other Bandit-Driven Algorithms

ALBL≈COMB 5 10 15 20 25 30 35 40 45 50 55 60 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 % of unlabelled data Accuracy ALBL COMB ALBL−Train diabetes ALBL>COMB 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy ALBL COMB ALBL−Train sonar

• ALBL>ALBL-Traingenerally

—importance-weightedmechanism needed for correctingbiased training accuracy • _ALBL_consistently_{comparable to or better than}_COMB

—learning performancemore useful thanhuman-criterion

(20)

2

Summary

Active Learning by Learning

• based onbandit learning+less biased performance estimatoras reward

• effective inmaking intelligent choices

—comparable or superior to the best of existing strategies

• effective inutilizing learning performance

—superior to human-criterion-based approach

open-source tool developed: https://github.com/ntucllab/libact

(21)

2

Discussion for Theoreticians

weightedtraining accuracy 1_t Pt

τ =1

1

pτ qynτ =g

(t)₍_x

nτ)y as reward

• is rewardunbiased estimatorof test performance?

no for learned g(t) (yes for fixed g)

• is reward fixedbefore playing?

no because g(t)learned from (xnt,ynt)

• _{is reward}_{independent of each other}_?

no because past history all in reward

ALBL: tools from theoreticians

(22)

2

Have We Made Active Learning More Realistic? (1/2)

Yes!

open-source toollibactdeveloped (Yang, 2017)

https://github.com/ntucllab/libact

• including uncertainty, QUIRE, PSDS, . . .,and ALBL

• _received_>₅₀₀_stars_{and continuous}_issues

“libact is a Python package designed tomake

(23)

2

Have We Made Active Learning More Realistic? (2/2)

No! (Not Completely!)

• single-most raisedissue: hard to install on Windows/Mac —because several strategies requires some C packages • performance in a recent industry project:

—uncertaintysamplingoften suffices; ALBLdragged down by bad strategy

“libact is a Python package designed to make

(24)

2

Other Attempts for Realistic Active Learning

“learn” a strategybeforehandrather than on-the-fly? • _{transfer active learning experience}(Chu and Lin, 2016)

—not easy to realizein open-source package

other active learning tasks? • NLP(Yuan et al., 2020)

• reinforcement learning(Chen et al., 2020) • _{annotation cost-sensitive}(Tsou and Lin, 2019) • _{classification cost-sensitive}(Huang and Lin, 2016)

(25)

2

Lessons Learned from Research on Active Learning by Learning

by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via Wikimedia Commons

1 scalability bottleneckof artificial intelligence:

choiceof methods/models/parameter/. . .

2 think outside of themathbox:

‘non-rigorous’ usage may begood enough

3 easy-to-use in design 6=easy-to-use in reality