• No results found

Part 1 Building your Own Binary Classification Model.txt

N/A
N/A
Protected

Academic year: 2021

Share "Part 1 Building your Own Binary Classification Model.txt"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

First Binary Classification Model First Binary Classification Model Data_Final Project.xlsx

Data_Final Project.xlsx

You work for a bank as a business data analyst in the credit card risk-odelin! You work for a bank as a business data analyst in the credit card risk-odelin! de"artent. Your bank conducted a bold ex"erient three years a!o# for a sin!le day de"artent. Your bank conducted a bold ex"erient three years a!o# for a sin!le day it $uietly issued credit cards to e%eryone who a""lied& re!ardless of their credit it $uietly issued credit cards to e%eryone who a""lied& re!ardless of their credit risk& until the bank had issued '(( cards without screenin! a""licants.

risk& until the bank had issued '(( cards without screenin! a""licants.

)fter three years& *+(& or ,+& of those card reci"ients defaulted# they failed to )fter three years& *+(& or ,+& of those card reci"ients defaulted# they failed to "ay back at least soe of the oney they owed. owe%er& the bank collected %ery "ay back at least soe of the oney they owed. owe%er& the bank collected %ery %aluable "ro"rietary data that it can now use to o"tii/e its future card-issuin! %aluable "ro"rietary data that it can now use to o"tii/e its future card-issuin! "rocess.

"rocess.

0he bank initially collected six "ieces of data about each "erson# 0he bank initially collected six "ieces of data about each "erson#

 )!e  )!e

� �

 Years at current e"loyer  Years at current e"loyer

� �

 Years at current address  Years at current address

� �

 1ncoe o%er the "ast year  1ncoe o%er the "ast year

� �

 Current credit card debt& and  Current credit card debt& and

� �

 Current autoobile debt  Current autoobile debt

� �

1n addition& the bank now has a binary outcoe# default 2 *& and no default 2 (. 1n addition& the bank now has a binary outcoe# default 2 *& and no default 2 (. Your first assi!nent is to analy/e the data and create a binary classification Your first assi!nent is to analy/e the data and create a binary classification odel to forecast future defaults.

odel to forecast future defaults. You wi

You will cobll cobine daine data frota fro the  the abo%e abo%e six insix in"uts to "uts to out"ut out"ut a sin!a sin!le le score. � � score. 3se the�� 3se the 4oldier Perforance s"readsheet for a si"le exa"le of cobinin! ulti"le in"uts. 4oldier Perforance s"readsheet for a si"le exa"le of cobinin! ulti"le in"uts. Forecastin! 4oldier Perforance.xlsx

Forecastin! 4oldier Perforance.xlsx 0he relati

0he relati%e rank-or%e rank-orderin! of scoderin! of scores will deteres will deterine the odrine the odelel s effecti%e�� s effecti%eness. Forness. For con%enience-- in "articular& so that you can use the )3C Calculator con%enience-- in "articular& so that you can use the )3C Calculator 4"readsheet--you are asked to use a scale for 4"readsheet--your score that has a axiu 5 6.+ and a iniu you are asked to use a scale for your score that has a axiu 5 6.+ and a iniu 7 -6.+.

7 -6.+.

)t first you ar

)t first you are not told what yoe not told what your bankur bank s own best esti��s own best estiate for its coate for its cost "er Falsest "er False 8e!ati%e 9acce"ted a""licant who becoes a defaultin! custoer: and False Positi%e 8e!ati%e 9acce"ted a""licant who becoes a defaultin! custoer: and False Positi%e 9rejected custoer who would not ha%e defaulted: classification.

9rejected custoer who would not ha%e defaulted: classification.

0herefore& the best you can do is to desi!n your odel to axii/e the )rea 3nder 0herefore& the best you can do is to desi!n your odel to axii/e the )rea 3nder the ;<C Cur%e& or )3C.

the ;<C Cur%e& or )3C. You ar

You are told te told that if yhat if your oour odel is edel is effeffecti%cti%e 9e 9 hi!h e� � hi!h enou!nou!h h )3C& n�� )3C& not defot definedined further:

further: and and robust � � robust 9a!ain �� 9a!ain not dnot defined& efined& but ibut in !enn !eneral eral this this eans eans relati%relati%elyely little decrease in )3C across ulti"le sets of new data: then it ay be ado"ted by little decrease in )3C across ulti"le sets of new data: then it ay be ado"ted by the bank as its "redicti%e odel for default& to deterine which future a""licants the bank as its "redicti%e odel for default& to deterine which future a""licants will be issued credit cards.

will be issued credit cards. You a

You are fire first !rst !i%en i%en a a 0rainin� � 0rainin! 4et ! 4et of ,(�� of ,(( out ( out of thof the '(( e '(( "eo"le "eo"le in thin thee

ex"erient. 0he Data_For_Final_Project 9below: has both the trainin! set and test ex"erient. 0he Data_For_Final_Project 9below: has both the trainin! set and test set you will need.

set you will need.

Desi!n your odel usin! the 0rainin! 4et. 4tandardi/ed %ersions of the in"ut data Desi!n your odel usin! the 0rainin! 4et. 4tandardi/ed %ersions of the in"ut data

(2)

also "ro%ided for your con%enience. You ay cobine the six in"uts by addin! the to& or subtractin! the fro& each other& takin! si"le ratios& etc. =xclude in"uts that are not hel"ful and then ex"erient with how to cobine the ost inforati%e in"uts.

8ote that will need soe of your $ui/ answers a!ain later& so "lease write the down and kee" track of the as you !o alon!.

>uestion# ?hat is your odel@ Ai%e it as a function of the two or ore of the six in"uts. For exa"le# 9)!e  Years at Current )ddress:1ncoe not a !reat odelE. Your odel should ha%e at least two in"uts.

* r

?hat is your odel s )3C on the 0rainin! 4et@ 3se two di!its to the ri!ht of the�

decial "lace. *, x

' x .G r

9999Hess than .+ is not correct - you need to ake the hi!hest %alue the lowest by di%idin! by -*.

.+ has no "redicti%e %alue.

.I or hi!her is too !ood to be trueE::::

1nitial )ssessent for <%er-fittin! 9testin! your odel on new data:

8ext test your odel& without chan!in! any "araeters& on the 0est 4et of ,(( additional a""licants. 4ee the 0est 4et s"readsheet. 1t is "art of the

Data_For_Final_Project 9below: and has both the trainin! and test set. Data_Final Project.xlsx

int# Make and use a second co"y of the )3C Calculator 4"readsheet so that you can co"are 0est 4et and 0rainin! 4et results easily.

)3C_Calculator and ;e%iew of )3C Cur%e.xlsx

?hat is your odel s new )3C on the 0est 4et@ Ai%e two di!its to the ri!ht of the�

decial "lace. *, x

6' x .6 x .J r

99995.+ is not %alid - ulti"ly by -* .+ eans no "redicti%e %alue

7 .I( is too !ood to be trueE:::::

(3)

8ow that you ha%e& ho"efully& de%elo"ed your odel to the "oint where it is relati%ely robust across the trainin! set and test set& your boss at the bank� �

finally !i%es you its current rou!h estiate of the bank s a%era!e costs for each�

ty"e of classification error.

8ote that all bank odels here include only "rofits and losses within three years of when a card is issued& so the i"act of out-years 9years beyond 6: can be

i!nored.

Cost Per False 8e!ati%e# K+((( Cost Per False Positi%e# K,+((

For the '(( indi%iduals that were autoatically !i%en cards without bein!

classified& the total cost of the ex"erient turned out to be ,+L9K+(((:L'(( or KG+(&(((. 0his is K*&,+( "er e%ent.

<nly odels with lower cost "er e%ent than K*&,+( should ha%e any %alue.

>uestion# ?hat is the threshold score on the 0rainin! 4et data for your odel that inii/es Cost "er =%ent@ You will need this nuber to answer later $uestions.

int# 3sin! the )3C Calculator 4"readsheet& identify which Colun dis"lays the sae cost-"er-e%ent 9row *G: as the o%erall iniu cost-"er-e%ent shown in Cell ,. 0he threshold is shown in row *( of that Colun. ?hat the threshold eans is that at and abo%e this nuber e%erythin! is classified as a Ndefault.N

,( x *((( x 6.+ r

99990hresholds !reater than ,.+ � ay not be utili/in! the full ran!e for analysis 0hresholds less than -,.+ ( ay not be utili/in! the full ran!e for analysis::::::: Findin! the Miniu Cost Per =%ent

>uestion# )!ain referrin! only to the 0rainin! 4et data& what is the o%erall iniu cost-"er-e%ent@

int# You will need this nuber to answer later $uestions. 1f you used the )3C Calculator& the o%erall iniu cost "er e%ent will be dis"layed in Cell ,.

8ote# for Coursera to inter"ret your answer correctly you ust !i%e your answer as an inte!er - no decials or dollar si!n.

For =xa"le - enter KJ((.(( as NJ((N '(( r

Co"arin! the 8ew Miniu Cost Per =%ent on 0est 4et Data

?hen you co"ared )3C for the 0rainin! and 0est 4ets& all that is necessary is to look u" the two different %alues in Cell AJ. But to !et an accurate easure of the cost-sa%in!s usin! the ori!inal odel on new data& you can not autoatically use the new threshold that results in the o%erall lowest cost-"er-e%ent on the 0est 4et.

(4)

;eeber that your odel is bein! tested for its ability to forecast - but the new o"tial threshold will be known only after the outcoes for the entire 0est 4et are known.

)ll you can use is the odel you de%elo"ed on the 0rainin! 4et data and the threshold fro the 0rainin! 4et that you should ha%e recorded when answerin! >uestion O.

>uestion# )t that sae threshold score 98<0 the threshold score that would inii/e costs for the new 0est 4et& but the old threshold score that inii/ed costs on� �

the 0rainin! 4et: what is the cost "er e%ent on the test set@

int# 3sin! the )3C Calculator 4"readsheet "re%iously "ro%ided& locate the colun on the 0rainin! 4et data that has the lowest-cost-"er e%ent. 0hat sae colun and threshold in the 0est 4et co"y of the )3C Calculator will ha%e a new

e%ent& dis"layed in row *G. 0his is alost always hi!her than the iniu cost-"er-e%ent on the 0rainin! 4et& and also hi!her than what the inial cost-"er-cost-"er-e%ent would be on the 0est 4et& if one could know the new o"tial threshold in ad%ance. 0his nuber is the actual cost "er e%ent when a""lyin! the odel-and-threshold de%elo"ed with the 0rainin! 4et to the new& 0est 4et data.

8ote# for Coursera to inter"ret your answer correctly you ust !i%e your answer as an inte!er - no decials or dollar si!n.

For =xa"le - enter KJ((.(( as NJ((N  ,(( x

* x *+( x G((.(( r

9999991f you find that your costs "er e%ent on the test set are uch hi!her than your costs "er e%ent on the trainin! set& consider akin! your odel si"ler �

"robably usin! fewer in"ut %ariables � as it is "robably still o%er-fittin! the trainin! set data. Probles with o%er-fittin! that are were not ob%ious at the ;<C-cur%e sta!e ay eer!e when inii/in! costs.:::::::::

Puttin! a Dollar alue on Your Model Plus the Data

)ssue your 0est 4et cost-"er-e%ent results fro >uestion ' are sustainable lon! ter.

>uestion# ow uch oney does the bank sa%e& "er e%ent& usin! your odel and its data-in"uts& instead of issuin! credit cards to e%eryone who asks@

int# the cost of issuin! credit cards to e%eryone 9no odel& no forecast: has been deterined to be ,+LK+((( 2 K*&,+( "er e%ent. Dollar %alue of the odel-"lus-data is the difference between K*&,+( and your nuber.

8ote# for Coursera to inter"ret your answer correctly you ust !i%e your answer as an inte!er - no decials or dollar si!n.

For =xa"le - enter KJ((.(( as NJ((N *(( x

,(( r

99999999952K*+( sa%in!s is a weak odel 5K*+( to 52 K,+( sa%in!s is an ok odel

(5)

5 K,+( to 52 KO+( sa%in!s is a %ery !ood odel 7KO+( sa%in!s is an excellent odel:::::::: Payback Period for Your Model

>uestion# Ai%en that it a""arently cost the bank KG+(&((( to conduct the three-year ex"erient& if the bank "rocesses *((( credit card a""licants "er day on a%era!e& how any days will it take to ensure future sa%in!s will "ay back the bankQs

initial in%estent@

Ai%e nuber rounded to the nearest day 9inte!er %alue:.

int# ulti"ly your answer to >uestion G - the cost sa%in!s "er a""licant - by *((( to !et the sa%in!s "er day.

G((((( x 6 r

999999More than a week � "oor O-G days � %ery !ood

,-6 days � excellent

* day � too !ood to be trueE:::::::::

)ny odel that is reducin! uncertainty will ha%e a 0rue Positi%e ;ate... ...=$ual to the 0est 1ncidence 9 of outcoes classified as NdefaultN: x ...Hess than the 0est 1ncidence 9 of outcoes classified as NdefaultN: x ...Areater than the 0est 1ncidence 9 of outcoes classified as NdefaultN: Ai%en that the base rate of default in the "o"ulation is ,+& any test that is reducin! uncertainty will ha%e a Positi%e Predicti%e alue 9PP:...

...=$ual to .,+ x ...Hess than .,+ x ...Areater than .,+

Ai%en that the base rate of default in the "o"ulation is ,+& any test that is reducin! uncertainty will ha%e a 8e!ati%e Predicti%e alue 98P:...

=$ual to .G+ x ...Hess than .G+ x ...Areater than .G+

Confusion Matrix Metrics. 0o deterine all "erforance etrics for a binary classification& it is sufficient to ha%e three %alues

0he Condition 1ncidence 9here the default rate of ,+:

0he "robability of 0rue Positi%es 9the 0rue Positi%e rate ulti"lied by the Condition 1ncidence:

(6)

0he 0est 1ncidence 9also called classification incidence - the su of the� � � �

"robability of 0rue Positi%es and False Positi%es:

0hese three %alues can all be obtained fro the )3C Calculator 4"readsheet and and then used as in"uts to the 1nforation Aain Calculator 4"readsheet to deterine all other "erforance etrics.

)3C_Calculator and ;e%iew of )3C Cur%e.xlsx 1nforation Aain Calculator.xlsx

>uestion# ?hat is your odel s 0rue Positi%e ;ate@�

4a%e this answer as it will be needed a!ain for Part 6 9>ui/ 6: * x

6( x .6( r

999999952 .,+ is incorrect::::::::

>uestion# ?hat is your odel s test incidence @� � �

4a%e this answer as it will be needed a!ain for Part 6 9>ui/ 6: ( x

* x

*((( x ,((.(( x

0est 1ncidences cannot be so sall that they force a hi!h false ne!ati%e rate nor lar!e that they force a hi!h false "ositi%e rate. ) "erfect test will of course ha%e a 0est 1ncidence e$ual to the Condition 1ncidence � but ost classification systes are focused on a%oidin! false ne!ati%es and ha%e a hi!her 0est 1ncidence than Condition 1ncidence.

References

Related documents

[r]

United Kingdom United Kingdom Guyana Guyana Ireland Ireland Columbia Columbia Ecuador Ecuador Venezuela Venezuela Trinidad Trinidad Panama Panama Puerto Rico Puerto Rico

university reform claims that strategic manage- ment has been strengthened in the universities, while the role of university per- sonnel has remained weak. Two major strategy

[r]

In this review, the research carried out using various ion-exchange resin-like adsorbents including modified clays, lignocellulosic biomasses, chitosan and its derivatives, microbial

While in Table 3 we present a pooled specification, to increase the chances for the added variables to exert a significant impact, in unreported regressions we repeat the

Do you think that the primary supplier expected the introduction of LSPA to influence the staffing structure within the laboratory.. If

The analysis of the asymmetric effects of exchange rates on wine export prices revealed that in many cases (France, Portugal, and Germany) the depreciation Euro relative