• No results found

Stages of (Batch) Machine Learning

N/A
N/A
Protected

Academic year: 2022

Share "Stages of (Batch) Machine Learning"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Evaluation

Robot Image Credit: Viktoriya Sukhanova © 123RF.com

These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper

attribution.

(2)

Stages of (Batch) Machine Learning

Given: labeled training data

• Assumes each with

Train the model:

model ß classifier.train(X, Y )

Apply the model to new data:

• Given: new unlabeled instance

model

learner

X , Y

x y

prediction

X, Y = {hx

i

, y

i

i}

ni=1

x

i

⇠ D(X ) y

i

= f

target

(x

i

)

x ⇠ D(X )

(3)

Classification Metrics

accuracy = # correct predictions

# test instances

error = 1 accuracy = # incorrect predictions

# test instances

3

(4)

recall = T P

T P + F N precision = T P

T P + F P

Confusion Matrix

• Given a dataset of P positive instances and N negative instances:

• Imagine using classifier to identify positive cases (there is a cat in an image)

Yes No

Yes No

Actual Class

Predicted Class

TP FN

FP TN

accuracy = T P + T N P + N

Probability that classifier Probability that actual class is

(5)

Training Data and Test Data

• Training data:

data used to build the model

• Test data:

new data, not used in the training process

• Training performance is often a poor indicator of generalization performance

– Generalization is what we really care about in ML – Easy to overfit the training data

– Performance on test data is a good indicator of generalization performance

– i.e., test accuracy is more important than training accuracy

5

(6)

Simple Decision Boundary

2 3 4 5 6 7 8 9 10

-1 0 1 2 3 4 5 6

Feature 1

Feature 2

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE

Decision Region 2 Decision

Region 1

Decision Boundary

(7)

More Complex Decision Boundary

2 3 4 5 6 7 8 9 10

-1 0 1 2 3 4 5 6

Feature 1

Feature 2

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE

Decision Region 2 Decision

Region 1

Decision Boundary

Slide by Padhraic Smyth, UCIrvine 7

(8)

Example: The Overfitting Phenomenon

X Y

(9)

A Complex Model

X Y

Y = high-order polynomial in X

Slide by Padhraic Smyth, UCIrvine 9

(10)

The True (simpler) Model

X Y

Y = a X + b + noise

(11)

Example: The Overfitting Phenomenon

X Y

Slide by Padhraic Smyth, UCIrvine 11

(12)

A Complex Model

X Y

Y = high-order polynomial in X

(13)

The True (simpler) Model

X Y

Y = a X + b + noise

Slide by Padhraic Smyth, UCIrvine 13

(14)

How Overfitting Affects Prediction

Predictive Error

Model Complexity Error on Training Data

(15)

How Overfitting Affects Prediction

Predictive Error

Model Complexity Error on Training Data

Error on Test Data

Slide by Padhraic Smyth, UCIrvine 15

(16)

How Overfitting Affects Prediction

Predictive Error

Model Complexity Error on Training Data

Error on Test Data

Ideal Range

for Model Complexity

Overfitting Underfitting

(17)

Comparing Classifiers

Say we have two classifiers, C1 and C2, and want to choose the best one to use for future predictions

Can we use training accuracy to choose between them?

• No!

– e.g., C1 = pruned decision tree, C2 = 1-NN

training_accuracy(1-NN) = 100%, but may not be best

Instead, choose based on test accuracy...

Based on Slide by Padhraic Smyth, UCIrvine 17

(18)

Training and Test Data

Full Data Set Training Data

Simulated Test Data (Validation)

Idea:

Train each

model on the

“ training data”...

...and then test each model’s accuracy on

the “test” data

(19)

k -Fold Cross-Validation

• Why just choose one particular “split” of the data?

– In principle, we should do this multiple times since performance may be different for each split

• k -Fold Cross-Validation (e.g., k=10)

– randomly partition full data set of n instances into k disjoint subsets (each roughly of size n/k)

– Choose each fold in turn as the test set; train model on the other folds and evaluate

– Compute statistics over k test performances, or choose best of the k models

– Can also do “leave-one-out CV” where k = n

19

(20)

Example 3-Fold CV

Test Data

Training Data 1st Partition Full Data Set

Training Data 2nd Partition

Test Data Training

Data

Test Data Training

Data kth Partition

...

PerformanceTest Test

Performance TestPerformance

Summary statistics

(21)

More on Cross-Validation

• Cross-validation generates an approximate estimate of how well the classifier will do on “unseen” data

– As k à n, the model becomes more accurate (more training data)

– ...but, CV becomes more computationally expensive – Choosing k < n is a compromise

• Averaging over different partitions is more robust than just a single train/validate partition of the data

21

(22)

Learning Curve

• Shows performance versus the # training examples

– Compute over a single training/testing split – Then, average across multiple trials of CV

(23)

Building Learning Curves

Test Data

Training Data 1stPartition Full Data Set

Training Data 2ndPartition

Test Data Training

Data

Test Data Training

Data kthPartition

...

a.) Randomize

Data Set Shuffle

Full Data Set

b.) Perform k-fold CV

Curve

C1 Curve

C2 Curve

C1 Curve

C2 Curve

C1 Curve C2

23

References

Related documents

• For moderate acne vulgaris (nodular), treatment with an oral antibiotic and a topical retinoid and benzoyl peroxide is considered first line; treatment with oral isotretinoin

For the material property identification, inverse modeling method is employed for the identification of hyperelastic and hyper-viscoelastic (HV) materials by use of the

The 1985 DAGMAR reform transferred liability for costs of ambulatory health care provided by public providers and private practitioners linked to regional social insurance offices

Espinosa is the current President of the Asia Pacific Federation of Human Resource Management (APFHRM) and was previously President of World Federation of People Management

The dynamics of business has crossed its boundaries set decades back and have introduced strong motives of societal well-beings in dispensing business and

Tanriverdi and Uysal (2010) study the role of cross-business integration in the merger context, defining cross-business integration as “the extent to which a multibusiness firm uses

HBCUs must find other major donors (e.g., corporations, foundations, and wealthy benefactors) who are willing to make significant contributions to enhance institutional