• No results found

Active Learning by Learning

N/A
N/A
Protected

Academic year: 2021

Share "Active Learning by Learning"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

This Webinar is provided to you by IEEE Computational Intelligence Society https://cis.ieee.org

Active Learning by Learning

(2)

2

About Me

Professor

National Taiwan University

Chief Data Science Consultant (former Chief Data Scientist)

Appier Inc.

Co-author Learning from Data

Instructor

(3)

2

Apple Recognition Problem

Note: Slide Taken from my “ML Techniques” MOOC

• needapple classifier: is this a picture of an apple?

• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning

(APAL stands for Apple and Pear Australia Ltd)

Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster

flic.kr/p/jNQ55 flic.kr/p/jzP1VB flic.kr/p/bdy2hZ flic.kr/p/51DKA8 flic.kr/p/9C3Ybd

nachans APAL Jo Jakeman APAL APAL

(4)

2

Apple Recognition Problem

Note: Slide Taken from my “ML Techniques” MOOC

• needapple classifier: is this a picture of an apple?

• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning

(APAL stands for Apple and Pear Australia Ltd)

Mr. Roboto. Richard North Richard North Emilian Robert Vicol Nathaniel McQueen flic.kr/p/i5BN85 flic.kr/p/bHhPkB flic.kr/p/d8tGou flic.kr/p/bpmGXW flic.kr/p/pZv1Mf

Crystal jfh686 skyseeker Janet Hudson Rennett Stowe

(5)

2

Active Learning: Learning by ‘Asking’

but labeling isexpensive

Protocol ⇔ Learning Philosophy • batch: ‘duck feeding’

active: ‘question asking’(iteratively)

—query ynofchosenxn unknown target function

f : X → Y

labeled training examples

( , +1), ( , +1), ( , +1) ( , -1), ( , -1), ( , -1)

unlabeled training examples

( ), ( ), ( ), ( ) learning algorithm A final hypothesis g≈f +1

(6)

2

Pool-Based Active Learning Problem

Given

• labeled poolDl =n(featurexn ,label yn(e.g. IsApple?))

oN n=1 • unlabeled pool Du= n ˜ xs oS s=1

Goal: design an algorithm that iteratively

1 strategically querysome˜xs to get associatedy˜s

2 move (x˜s,y˜s)fromDutoDl

3 learnclassifier g(t)fromDl

and improvetest accuracy of g(t) w.r.t#queries

(7)

2

How to Query Strategically?

by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via Wikimedia Commons

Strategy 1

askmost confused

question

Strategy 2

askmost frequent

question

Strategy 3

askmost helpful

question

(8)

2

Choice of Strategy

Strategy 1: uncertainty

askmost confusedquestion

Strategy 2: representative

askmost frequentquestion

Strategy 3:

exp. error reduction

askmost helpfulquestion

choosingone single strategy isnon-trivial:

0 10 20 30 40 50 60 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE 0 10 20 30 40 50 60 0.4 0.5 0.6 0.7 0.8 0.9 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE 0 10 20 30 40 50 60 0.5 0.6 0.7 0.8 0.9 1 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE

• human-designed strategyheuristicandconfinemachine’s ability can wefreethe machine

(9)

2

Our Contributions

a philosophical and algorithmic study of active learning, which ...

allows machine to makeintelligent choice of

strategies, just like mycute kids

• studiessound feedback schemeto tell machine about goodness of choice, just likewhat I do

• results inpromising active learning performance, just like (hopefully) thebright futureof my kids

will describekey philosophical ideas

(10)

2

Idea: Trial-and-Reward Like Human

by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via

Wikimedia Commons

K strategies: A1, A2, · · · , AK

tryone strategy

“goodness” of strategy asreward

(11)

2

Reduction to Bandit

when do humanstrial-and-reward?

gambling

K strategies: A1, A2, · · · , AK

tryone strategy

“goodness” of strategy asreward

K bandit machines: B1, B2, · · · , BK

tryone bandit machine

“luckiness” of machine asreward

—will take one well-knownprobabilistic bandit learner (EXP4.P)

(12)

2

Active Learning by Learning

K strategies: A1, A2, · · · , AK

try one strategy

“goodness” of strategy as reward

Given: K existing active learning strategies for t = 1, 2, . . . , T

1 let EXP4.Pdecide strategy Ak to try

2 query the ˜xssuggested by Ak, and compute g(t)

3 evaluategoodness of g(t) asrewardoftrialto update EXP4.P

(13)

2

Ideal Reward

ideal rewardafter updating classifier g(t) by the query (x nt,ynt): accuracy 1 M M X m=1 r ym =g(t)(xm) z ontest set {(xm,ym)}Mm=1

test accuracyasreward: area under query-accuracy curve≡cumulative reward

0 10 20 30 40 50 60 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy RAND UNCERTAIN PSDS QUIRE

test accuracyinfeasiblein practice: labelingexpensive, remember?

(14)

2

Training Accuracy as Reward

test accuracy (((( (((( ((((( hhhhhh hhhhhh h 1 M PM

m=1qym=g(t)(xm)y infeasible, naïve replacement:

accuracy 1 t t X τ =1 r ynτ =g(t)(xnτ) z onlabeled pool {(xnτ,ynτ)}tτ =1

training accuracyasreward:

training accuracytest accuracy?

not necessarily!!

—for active learning strategy that askseasiestquestions:

• training accuracyhigh:xnτ easy to label

• test accuracylow: not enough information aboutharder instances

training accuracy:

(15)

2

Weighted Training Accuracy as Reward

training accuracy (((( (((( ((((( hhhhhh hhhhhh h 1 t Pt τ =1qynτ =g (t)(x nτ)y biased,

wantless-biased estimator

non-uniform samplingtheorem: if(xnτ,ynτ)sampled with probability pτ >0from

data set {(xn,yn)}Nn=1in iteration τ , inexpectation

weightedtraining accuracy 1

t t X τ =1 1 pτJy nτ =g(xnτ)K ≈ 1 N N X n=1 Jyn=g(xn)K

• withprobabilistic querylike EXP4.P:

weightedtraining accuracy≈test accuracy

weightedtraining accuracy:

(16)

2

Active Learning by Learning (Hsu and Lin, 2015)

K strategies: A1, A2, · · · , AK

try one strategy

“goodness” of strategy as reward

Given: K existing active learning strategies for t = 1, 2, . . . , T

1 let EXP4.Pdecide strategy Ak to try

2 query the ˜xssuggested by Ak, and compute g(t)

3 evaluateweightedtraining accuracy of g(t) asrewardoftrialto update EXP4.P

(17)

2

Human-Designed Criterion as Reward

(Baram et al., 2004) COMB approach:

bandit +balancednessof g(t) on unlabeled data as reward

why? human criterion that matches classifier todomain assumptionbut many active learning applications are onunbalanced data!

—assumption may beunrealistic

(18)

2

Comparison with Single Strategies

UNCERTAINBest 5 10 15 20 25 30 35 40 45 50 55 60 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE vehicle PSDSBest 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE sonar QUIREBest 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 % of unlabelled data Accuracy ALBL RAND UNCERTAIN PSDS QUIRE diabetes

no single best strategyfor every data set: choosing needed

ALBLconsistentlymatches the best: similar findings across other data sets

(19)

2

Comparison with Other Bandit-Driven Algorithms

ALBL≈COMB 5 10 15 20 25 30 35 40 45 50 55 60 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 % of unlabelled data Accuracy ALBL COMB ALBL−Train diabetes ALBL>COMB 5 10 15 20 25 30 35 40 45 50 55 60 0.5 0.55 0.6 0.65 0.7 0.75 0.8 % of unlabelled data Accuracy ALBL COMB ALBL−Train sonar

ALBL>ALBL-Traingenerally

importance-weightedmechanism needed for correctingbiased training accuracy • ALBLconsistentlycomparable to or better thanCOMB

learning performancemore useful thanhuman-criterion

(20)

2

Summary

Active Learning by Learning

• based onbandit learning+less biased performance estimatoras reward

• effective inmaking intelligent choices

—comparable or superior to the best of existing strategies

• effective inutilizing learning performance

—superior to human-criterion-based approach

open-source tool developed: https://github.com/ntucllab/libact

(21)

2

Discussion for Theoreticians

weightedtraining accuracy 1t Pt

τ =1

1

pτ qynτ =g

(t)(x

nτ)y as reward

• is rewardunbiased estimatorof test performance?

no for learned g(t) (yes for fixed g)

• is reward fixedbefore playing?

no because g(t)learned from (xnt,ynt)

is rewardindependent of each other?

no because past history all in reward

ALBL: tools from theoreticians

(22)

2

Have We Made Active Learning More Realistic? (1/2)

Yes!

open-source toollibactdeveloped (Yang, 2017)

https://github.com/ntucllab/libact

• including uncertainty, QUIRE, PSDS, . . .,and ALBL

received>500starsand continuousissues

“libact is a Python package designed tomake

(23)

2

Have We Made Active Learning More Realistic? (2/2)

No! (Not Completely!)

• single-most raisedissue: hard to install on Windows/Mac —because several strategies requires some C packages • performance in a recent industry project:

uncertaintysamplingoften suffices; ALBLdragged down by bad strategy

“libact is a Python package designed to make

(24)

2

Other Attempts for Realistic Active Learning

“learn” a strategybeforehandrather than on-the-fly? • transfer active learning experience(Chu and Lin, 2016)

not easy to realizein open-source package

other active learning tasks? • NLP(Yuan et al., 2020)

• reinforcement learning(Chen et al., 2020) • annotation cost-sensitive(Tsou and Lin, 2019) • classification cost-sensitive(Huang and Lin, 2016)

(25)

2

Lessons Learned from Research on Active Learning by Learning

by DFID - UK Department for International Development; licensed under CC BY-SA 2.0 via Wikimedia Commons

1 scalability bottleneckof artificial intelligence:

choiceof methods/models/parameter/. . .

2 think outside of themathbox:

‘non-rigorous’ usage may begood enough

3 easy-to-use in design 6=easy-to-use in reality

References

Related documents

Construction of the Chinese University Prognostic Index for hepatocellular carcinoma and com- parison with the TNM staging system, the Okuda staging system, and the Cancer of

Key words: perceptions; strategic management; innovation; firm growth; strategies; decision making; greenhouse horticulture; farm accountancy data

Otherwise, the section is dominated by gastropods (cylidrobullinids, bullinids, and mathildids) most of which preyed probably on sedentary organisms (polychaetes and/or

Table 3 indicates that in 8 out of 10 areas concerned with the management of internal organisational processes, firms orientated towards utilising available e-commerce

„The LKZ Prien and the companies for logistic and distribution located in Bavaria offer a good basis for further growth in the area of transport and logistic in Bavaria“. Mega

restoring power downstream of the first fault (314) responsive to a power requirement and power availability, including analyzing the power requirement and the power avail-

To be clear, privacy notices are public-facing documents designed to inform individuals of an organization’s data processing practices, as well as any other

An express provision requiring good faith and fair dealing to be a norm binding on the contracting parties was therefore rejected in the legislative process, but Article 7(2)