• No results found

Bias-Variance Tradeoff

N/A
N/A
Protected

Academic year: 2021

Share "Bias-Variance Tradeoff"

Copied!
41
0
0

Loading.... (view fulltext now)

Full text

(1)Bias-Variance Tradeoff CS229: Machine Learning Carlos Guestrin Stanford University. Slides include content developed by and co-developed with Emily Fox ©2021 Carlos Guestrin. CS229: Machine Learning.

(2) 2. ©2021 Carlos Guestrin. CS229: Machine Learning.

(3) Fit data with a line or … ?. price ($). y. Dude, it’s not a linear relationship! square feet (sq.ft.). 3. ©2021 Carlos Guestrin. x CS229: Machine Learning.

(4) What about a quadratic function?. price ($). y. square feet (sq.ft.) 4. ©2021 Carlos Guestrin. x CS229: Machine Learning.

(5) Even higher order polynomial. price ($). y. I can minimize your RSS square feet (sq.ft.). 5. ©2021 Carlos Guestrin. x CS229: Machine Learning.

(6) Do you believe this fit?. price ($). y My house isn’t worth so little square feet (sq.ft.) 6. ©2021 Carlos Guestrin. x CS229: Machine Learning.

(7) Do you believe this fit?. price ($). y. Minimizes RSS, but bad predictions. square feet (sq.ft.) 7. ©2021 Carlos Guestrin. x CS229: Machine Learning.

(8) “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” George Box, 1987.. ©2021 Carlos Guestrin. CS229: Machine Learning.

(9) Assessing the loss. ©2021 Carlos Guestrin. CS229: Machine Learning.

(10) Assessing the loss Part 1: Training error. ©2021 Carlos Guestrin. CS229: Machine Learning.

(11) Define training data. price ($). y. square feet (sq.ft.) 11. ©2021 Carlos Guestrin. x CS229: Machine Learning.

(12) Define training data. price ($). y. square feet (sq.ft.) 12. ©2021 Carlos Guestrin. x CS229: Machine Learning.

(13) Example: Fit quadratic to minimize RSS. y. Training error (ŵ) = price ($). N 1 X (yi-fŵ(xi))2 N i=1. ŵ minimizes loss on training data square feet (sq.ft.) 13. ©2021 Carlos Guestrin. x. CS229: Machine Learning.

(14) Example: Use squared error loss (y-fŵ(x))2. price ($). y. square feet (sq.ft.) 14. ©2021 Carlos Guestrin. Training error (ŵ) = 1/N * [($train 1-fŵ(sq.ft.train 1))2 + ($train 2-fŵ(sq.ft.train 2))2 + ($train 3-fŵ(sq.ft.train 3))2 + … include all training houses] x CS229: Machine Learning.

(15) Error. Training error vs. model complexity. y 15. Model complexity. y. x. x. ©2021 Carlos Guestrin. y x. CS229: Machine Learning.

(16) Is training error a good measure of predictive performance? Issue: Training error is overly optimistic…ŵ was fit to training data. price ($). y. xt 16. square feet (sq.ft.). Small training error ≠ good predictions (unless training data includes everything you might ever see). x ©2021 Carlos Guestrin. CS229: Machine Learning.

(17) Assessing the loss Part 2: Generalization (true) error. ©2021 Carlos Guestrin. CS229: Machine Learning.

(18) Generalization error Really want estimate of loss over all possible (. ,$) pairs. Lots of houses in neighborhood, but not in dataset. 18. ©2021 Carlos Guestrin. CS229: Machine Learning.

(19) Generalization error definition Really want estimate of loss over all possible (. Formally:. ,$) pairs. average over all possible (x,y) pairs weighted by how likely each is. generalization error = Ex,y[L(y,fŵ(x))] fit using training data. 19. ©2021 Carlos Guestrin. CS229: Machine Learning.

(20) Generalization error vs. model complexity. x. price ($). price ($). fŵ. square feet (sq.ft.). 20. y. y price ($). y. square feet (sq.ft.). ©2021 Carlos Guestrin. x. square feet (sq.ft.). x. CS229: Machine Learning.

(21) Error. True error vs. model complexity. y 21. Model complexity. y. x. x. ©2021 Carlos Guestrin. y x. CS229: Machine Learning.

(22) Assessing the loss Part 3: Test error. ©2021 Carlos Guestrin. CS229: Machine Learning.

(23) Training, true, test error vs. model complexity. Error. Overfitting if:. y 23. Model complexity. y. x. x. ©2021 Carlos Guestrin. y x. CS229: Machine Learning.

(24) 3 sources of error + the bias-variance tradeoff. ©2021 Carlos Guestrin. CS229: Machine Learning.

(25) 3 sources of error In forming predictions, there are 3 sources of error: 1. Noise 2. Bias 3. Variance. 25. ©2021 Carlos Guestrin. CS229: Machine Learning.

(26) Data inherently noisy yi = fw(true)(xi)+εi. price ($). y. square feet (sq.ft.). 26. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(27) Bias contribution Suppose we fit a constant function. fŵ(train1). square feet (sq.ft.). 27. y price ($). price ($). y. N other house sales ( ,$). N house sales ( ,$). x. fŵ(train2) square feet (sq.ft.). ©2021 Carlos Guestrin. x CS229: Machine Learning.

(28) Bias contribution Over all possible size N training sets, what do I expect my fit to be?. price ($). y. fw(true). fŵ(train3) f ŵ(train1). f!". fŵ(train2) square feet (sq.ft.) 28. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(29) Bias contribution Bias(x) = fw(true)(x) - f!"(x). Is our approach flexible enough to capture fw(true)? If not, error in predicJons.. y price ($). fw(true). low complexity à. f!" square feet (sq.ft.) 29. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(30) Variance contribution How much do specific fits vary from the expected fit?. price ($). y. fŵ(train3) f ŵ(train1) f!". fŵ(train2) square feet (sq.ft.) 30. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(31) Variance contribuGon How much do specific fits vary from the expected fit?. price ($). y. fŵ(train3) f ŵ(train1) f!". fŵ(train2) square feet (sq.ft.) 31. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(32) Variance contribuGon How much do specific fits vary from the expected fit?. low complexity à. price ($). y. Can specific fits vary widely? If so, erraJc predicJons. square feet (sq.3.) 32. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(33) Variance of high-complexity models Assume we fit a high-order polynomial. square feet (sq.ft.). 33. y. fŵ(train1). price ($). price ($). y. x. fŵ(train2) square feet (sq.3.). ©2021 Carlos Guestrin. x CS229: Machine Learning.

(34) Variance of high-complexity models Suppose we fit a high-order polynomial. fŵ(train2). price ($). y. fŵ(train1) f!" fŵ(train3) square feet (sq.3.). 34. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(35) Variance of high-complexity models high complexity à. price ($). y. f!" square feet (sq.3.). 35. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(36) Bias of high-complexity models. y price ($). fw(true) f!" square feet (sq.3.). 36. high complexity à. x ©2021 Carlos Guestrin. CS229: Machine Learning.

(37) Sum of 3 sources of error Average squared error at xt = σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt)). fŵ(training set). price ($). y. xt 37. square feet (sq.ft.). x. ©2021 Carlos Guestrin. CS229: Machine Learning.

(38) Bias-variance tradeoff. y 38. Model complexity x. y x. ©2021 Carlos Guestrin. CS229: Machine Learning.

(39) Error. Error vs. amount of data. # data points in training set 39. ©2021 Carlos Guestrin. CS229: Machine Learning.

(40) Summary of bias-variance tradeoff. ©2021 Carlos Guestrin. CS229: Machine Learning.

(41) What you can do now… • Contrast relaJonship between model complexity and train, true and test loss • Compute training and test error given a loss funcJon for different model complexiJes • List and interpret the 3 sources of avg. predicJon error - Irreducible error, bias, and variance. 41. ©2021 Carlos Guestrin. CS229: Machine Learning.

(42)

References

Related documents

The Servizio dashboards, grids & reports provide clear data display and ease interrogation of data, which gives you certainty in your information so you can make

[r]

Economic and Social Research Institute Department of National Accounts 3-1-1, Kasumigaseki, Chiyoda-ku 100-8970

Proper Mass of the Day (Third Sunday of Lent) or from the Ritual Masses: “The Scrutinies.” On this Sunday is celebrated the first Scrutiny in preparation of the Baptism of the

aryloxide arms and a central arene ring which provides additional electronic stabilization via a  interaction with the uranium center.. have also reported a

In this research, we survey the literature on the current planning horizon for the railway passenger service and we identify a gap in the planning horizon – demand based

“ … though I've never met Franca [Carella], I have never spoken to Franca, I know all kinds of people who have gone through that program and I know that she helps people out who

• If card is lost/stolen – call the 800 number to vendor to have it replaced and still have access to your money via the Money Network™ Check • No waiting for deposited checks