• No results found

Stepwise logistic regression Assessing the fit of the Model Logistic function

N/A
N/A
Protected

Academic year: 2021

Share "Stepwise logistic regression Assessing the fit of the Model Logistic function"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Stepwise logistic regression Assessing the fit of the Model

ผู้ช่วยศาสตราจารย์นิคม ถนอมเสียง ภาควิชาชีวสถิติและประชากรศาสตร์ คณะสาธารณสุขศาสตร์ ม.ขอนแก่น

0 1

1/2

( )

 

 e 1

1 ) f(-

<--- Z --->

Logistic function

) e ( 1 ) 1 f(     

0 e 1

1  

 

1 e 1

1   

 

Model Building ในการวิเคราะห์ Logistic Regression Stepwise logistic regression

การวิเคราะห์พิจารณาคัดเลือกตัวแปรมีหลายๆ วิธี เช่น 1. พิจารณาค่า p-value ของตัวแปรที่มีความสําคัญ 2. เปรียบเทียบ reduced model กับ full Model

full model มีตัวแปรทุกตัว (ขณะทีวิเคราะห์แต่ละขั้นตอน ่ ) reduce model ตัดตัวแปรออกไป 1 ตัว

เช่น coro = b 0 + b 1 (SYSBP)+b 2 (DM) +b 3 (LDL) ->full coro = b 0 + b 1 (SYSBP)+b 2 (DM) ->reduce

Stepwise logistic regression พิจารณาค่า p-value จากวิธีการสถิติของตัวแปรที่มีความสําคัญ 1. กําหนด p-value ทีจะนําตัวแปรเข้าในสมการ ่ (Pe)

กําหนด p-value ทีจะนําตัวแปรออกจากสมการ ่ (Pr) 2. พิจารณาค่า p-value ทีมีนัยสําคัญเข้าไปในสมการก่อน ่

(พิจารณาจาก p-value ทีน้อยก่อน ่ ) p-value < Pe 3. คํานวณค่าสถิติ เลือก p-value ทีจะนําตัวแปรออก ่

p-value > Pr

4. ทําตามขั้นตอนที่ 2-3 จนไม่มีตัวแปรนําเข้า/ตัวแปรออก Hosmer & Lemeshow (2000) กําหนด

p-value for entry (Pe).15-.20 , p-value for remove (Pr) > Pe

Code Sheet for the Variables in the Low Birth Weight Study

Variable Description Codes/Values Name

1 Identification Code ID Number ID

2 Low Birth Weight 1 = BWT<=2500g, LOW

0 = BWT>2500g

3 Age of Mother Years AGE

4 Weight of Mother at Pounds LWT

Last Menstrual Period

5 Race 1 = White RACE

2 = Black 3 = Other

6 Smoking Status 0 = No, 1 = Yes SMOKE

During Pregnancy

7 History of Premature Labor 0 = None PTL

1 = One 2 = Two, etc.

8 History of Hypertension 0 = No, 1 = Yes HT 9 Presence of Uterine 0 = No, 1 = Yes UI

Irritability

10 Number of Physician Visits 0 = None, 1 = One FTV During the First Trimester 2 = Two,etc.

11 Birth Weight Grams BWT

1. กําหนด p-value ทีจะนําตัวแปรเข้าในสมการ ่ (Pe) =.20 กําหนด p-value ทีจะนําตัวแปรออกจากสมการ ่ (Pr) =.25 2. พิจารณาจาก p-value ทีน้อยก่อน ่ และ p-value < Pe

. logit low lwt

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -114.37209 Iteration 2: log likelihood = -114.34534 Iteration 3: log likelihood = -114.34533

Logistic regression Number of obs = 189 LR chi2(1) = 5.98 Prob > chi2 = 0.0145 Log likelihood = -114.34533 Pseudo R2 = 0.0255

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0140583 .0061696 -2.28 0.023 -.0261504 -.0019661 _cons | .9983135 .7852908 1.27 0.204 -.5408283 2.537455 ---

. logit low age

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -115.96259 Iteration 2: log likelihood = -115.95598 Iteration 3: log likelihood = -115.95598

Logistic regression Number of obs = 189 LR chi2(1) = 2.76 Prob > chi2 = 0.0966 Log likelihood = -115.95598 Pseudo R2 = 0.0118 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- age | -.0511529 .0315138 -1.62 0.105 -.1129188 .0106129 _cons | .3845819 .7321251 0.53 0.599 -1.050357 1.819521 ---

. logit low Irace_D2

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -116.51366 Iteration 2: log likelihood = -116.50935 Iteration 3: log likelihood = -116.50935

Logistic regression Number of obs = 189 LR chi2(1) = 1.65 Prob > chi2 = 0.1985 Log likelihood = -116.50935 Pseudo R2 = 0.0070 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- Irace_D2 | .5635762 .4325561 1.30 0.193 -.2842181 1.41137 _cons | -.8737311 .17184 -5.08 0.000 -1.210531 -.5369309 ---

(2)

2

. logit low Irace_D3

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -116.45064 Iteration 2: log likelihood = -116.44906 Iteration 3: log likelihood = -116.44906

Logistic regression Number of obs = 189 LR chi2(1) = 1.77 Prob > chi2 = 0.1829 Log likelihood = -116.44906 Pseudo R2 = 0.0076 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- Irace_D3 | .4321825 .3233959 1.34 0.181 -.2016619 1.066027 _cons | -.9509763 .2019292 -4.71 0.000 -1.34675 -.5552023 ---

. logit low ftv

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -116.95056 Iteration 2: log likelihood = -116.94943 Iteration 3: log likelihood = -116.94943

Logistic regression Number of obs = 189 LR chi2(1) = 0.77 Prob > chi2 = 0.3792 Log likelihood = -116.94943 Pseudo R2 = 0.0033 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- ftv | -.1351199 .1566986 -0.86 0.389 -.4422435 .1720037 _cons | -.6867585 .1948119 -3.53 0.000 -1.068583 -.3049343 ---

3. คํานวณค่าสถิติ พิจารณา p-value ทีจะนําตัวแปรออก ่ (p-value > Pr )

-ตัวแปร lwt มี p-value = .023 < Pr คงไว้ในโมเดล

. xi: logit low lwt

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -114.41626 Iteration 2: log likelihood = -114.34546 Iteration 3: log likelihood = -114.34533

Logit estimates Number of obs = 189

LR chi2(1) = 5.98

Prob > chi2 = 0.0145

Log likelihood = -114.34533 Pseudo R2 = 0.0255

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0140583 .0061696 -2.28 0.023 -.0261504 -.0019661 _cons | .9983143 .7852889 1.27 0.204 -.5408235 2.537452 ---

4. พิจารณาตัวแปร p-value ทีน้อยก่อน ่ และ p-value < Pe -ตัวแปร age p-value=.218 > Pe

. xi:logit low lwt age

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -113.60317 Iteration 2: log likelihood = -113.5617 Iteration 3: log likelihood = -113.56169

Logistic regression Number of obs = 189 LR chi2(2) = 7.55 Prob > chi2 = 0.0230 Log likelihood = -113.56169 Pseudo R2 = 0.0322

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0127754 .0062112 -2.06 0.040 -.0249492 -.0006016 age | -.0397879 .0322873 -1.23 0.218 -.1030699 .023494 _cons | 1.748772 .9970965 1.75 0.079 -.2055009 3.703046 ---

. xi:logit low lwt ftv

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -114.1942 Iteration 2: log likelihood = -114.16288 Iteration 3: log likelihood = -114.16287

Logistic regression Number of obs = 189 LR chi2(2) = 6.35 Prob > chi2 = 0.0419 Log likelihood = -114.16287 Pseudo R2 = 0.0270

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0137064 .0062012 -2.21 0.027 -.0258605 -.0015523 ftv | -.0977257 .1635011 -0.60 0.550 -.4181819 .2227306 _cons | 1.02784 .7888371 1.30 0.193 -.5182523 2.573932 ---

4. พิจารณาตัวแปร p-value ทีน้อยก่อน ่ และ p-value < Pe -ตัวแปร ftv p-value=.558

. xi:logit low lwt i.race

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted) Iteration 0: log likelihood = -117.336

Iteration 1: log likelihood = -111.73378 Iteration 2: log likelihood = -111.62959 Iteration 3: log likelihood = -111.62955 Iteration 4: log likelihood = -111.62955

Logistic regression Number of obs = 189 LR chi2(3) = 11.41 Prob > chi2 = 0.0097 Log likelihood = -111.62955 Pseudo R2 = 0.0486 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0152231 .0064394 -2.36 0.018 -.027844 -.0026022 _Irace_2 | 1.081066 .4880522 2.22 0.027 .1245015 2.037631 _Irace_3 | .4806033 .3566737 1.35 0.178 -.2184644 1.179671 _cons | .8057535 .8451667 0.95 0.340 -.8507428 2.46225 --- . test _Irace_2 _Irace_3

( 1) [low]_Irace_2 = 0 ( 2) [low]_Irace_3 = 0

chi2( 2) = 5.40 Prob > chi2 = 0.0671

4. คํานวณค่าสถิติ พิจารณา p-value ทีจะนําตัวแปร ่ เข้า (p-value<Pe) -ตัวแปร _Irace_2 มี p-value = .027

_Irace_3 มี p-value = .178 2 ตัวแปร p-value=0.0671 < Pe นําเข้าโมเดล

. xi:logit low lwt i.race

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted) Iteration 0: log likelihood = -117.336

Iteration 1: log likelihood = -111.73378 Iteration 2: log likelihood = -111.62959 Iteration 3: log likelihood = -111.62955 Iteration 4: log likelihood = -111.62955

Logistic regression Number of obs = 189 LR chi2(3) = 11.41 Prob > chi2 = 0.0097 Log likelihood = -111.62955 Pseudo R2 = 0.0486 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0152231 .0064394 -2.36 0.018 -.027844 -.0026022 _Irace_2 | 1.081066 .4880522 2.22 0.027 .1245015 2.037631 _Irace_3 | .4806033 .3566737 1.35 0.178 -.2184644 1.179671 _cons | .8057535 .8451667 0.95 0.340 -.8507428 2.46225 ---

5. คํานวณค่าสถิติ พิจารณา p-value ทีจะนําตัวแปรออก ่ (p-value > Pr )

-ตัวแปร _Irace_2 มี p-value = .027 < Pr คงไว้ในโมเดล

_Irace_3 มี p-value = .178 < Pr คงไว้ในโมเดล

(3)

3

6. พิจารณาตัวแปร p-value ทีน้อยก่อน ่ และ p-value < Pe -ตัวแปร age มี p-value = .443 > Pe

. xi:logit low lwt i.race age

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

Iteration 0: log likelihood = -117.336

Iteration 4: log likelihood = -111.33032

Logistic regression Number of obs = 189 LR chi2(4) = 12.01 Prob > chi2 = 0.0173 Log likelihood = -111.33032 Pseudo R2 = 0.0512 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0143532 .0065228 -2.20 0.028 -.0271378 -.0015687 _Irace_2 | 1.003822 .4980145 2.02 0.044 .0277315 1.979912 _Irace_3 | .4434608 .3602574 1.23 0.218 -.2626307 1.149552 age | -.0255238 .0332521 -0.77 0.443 -.0906967 .0396492 _cons | 1.306741 1.069786 1.22 0.222 -.790001 3.403483 ---

. xi:logit low lwt i.race ftv

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -111.6474 Iteration 2: log likelihood = -111.53946 Iteration 3: log likelihood = -111.53941 Iteration 4: log likelihood = -111.53941

Logistic regression Number of obs = 189 LR chi2(4) = 11.59 Prob > chi2 = 0.0206 Log likelihood = -111.53941 Pseudo R2 = 0.0494

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0149921 .0064719 -2.32 0.021 -.0276767 -.0023075 _Irace_2 | 1.072486 .4887825 2.19 0.028 .1144895 2.030482 _Irace_3 | .4620372 .359807 1.28 0.199 -.2431715 1.167246 ftv | -.0695769 .1650308 -0.42 0.673 -.3930312 .2538775 _cons | .8378754 .8518404 0.98 0.325 -.8317011 2.507452 ---

6. พิจารณาตัวแปร p-value ทีน้อยก่อน ่ และ p-value < Pe -ตัวแปร ftv p-value=.673 > Pe

9. คํานวณค่าสถิติ พิจารณา p-value ทีจะนําตัวแปรเข้าในโมเดล ่ (p-value < Pe )

-ตัวแปร age มี p-value = .443, ftv p-value=.673 > Pe ยุตินําเข้าโมเดล

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0149921 .0064719 -2.32 0.021 -.0276767 -.0023075 _Irace_2 | 1.072486 .4887825 2.19 0.028 .1144895 2.030482 _Irace_3 | .4620372 .359807 1.28 0.199 -.2431715 1.167246 ftv | -.0695769 .1650308 -0.42 0.673 -.3930312 .2538775 _cons | .8378754 .8518404 0.98 0.325 -.8317011 2.507452 --- ---

low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0143532 .0065228 -2.20 0.028 -.0271378 -.0015687 _Irace_2 | 1.003822 .4980145 2.02 0.044 .0277315 1.979912 _Irace_3 | .4434608 .3602574 1.23 0.218 -.2626307 1.149552 age | -.0255238 .0332521 -0.77 0.443 -.0906967 .0396492 _cons | 1.306741 1.069786 1.22 0.222 -.790001 3.403483 ---

ดังนั้น โมเดลที่สร้างขึ้นประกอบด้วย lwt และ race

. xi:logit low lwt i.race

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted) Iteration 0: log likelihood = -117.336

Iteration 1: log likelihood = -111.73378 Iteration 2: log likelihood = -111.62959 Iteration 3: log likelihood = -111.62955 Iteration 4: log likelihood = -111.62955

Logistic regression Number of obs = 189 LR chi2(3) = 11.41 Prob > chi2 = 0.0097 Log likelihood = -111.62955 Pseudo R2 = 0.0486 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0152231 .0064394 -2.36 0.018 -.027844 -.0026022 _Irace_2 | 1.081066 .4880522 2.22 0.027 .1245015 2.037631 _Irace_3 | .4806033 .3566737 1.35 0.178 -.2184644 1.179671 _cons | .8057535 .8451667 0.95 0.340 -.8507428 2.46225 ---

. xi:sw logit low lwt age (i.race) ftv, pr(.25) pe(.20) forward i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

begin with empty model p = 0.0227 < 0.2000 adding lwt

p = 0.0671 < 0.2000 adding _Irace_2 _Irace_3

Logistic regression Number of obs = 189 LR chi2(3) = 11.41 Prob > chi2 = 0.0097 Log likelihood = -111.62955 Pseudo R2 = 0.0486

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0152231 .0064394 -2.36 0.018 -.027844 -.0026022 _Irace_2 | 1.081066 .4880522 2.22 0.027 .1245015 2.037631 _Irace_3 | .4806033 .3566737 1.35 0.178 -.2184644 1.179671 _cons | .8057535 .8451667 0.95 0.340 -.8507428 2.46225 ---

Stepwise logistic regression จาก STATA -กําหนด p-value for entry (Pe)=.20

p-value for remove (Pr) =.25

การกําหนด p-value for entry สูงหรือตํ่าเกินไป

-use more tradition level (.05) fails to identify variables known to be important ?

-higher level has disadvantage of including variables that are of questionable importance at the model building stage

(Original: Mickey & Greenland,1977:p125-137;

Cite in : Hosmer & Lemeshow (2000): p95 )

(4)

4

. xi: sw logit low age lwt i.race ftv, pr(.10) pe(.05) forward i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

begin with empty model p = 0.0227 < 0.0500 adding lwt

Logit estimates Number of obs = 189

LR chi2(1) = 5.98

Prob > chi2 = 0.0145

Log likelihood = -114.34533 Pseudo R2 = 0.0255

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0140583 .0061696 -2.28 0.023 -.0261504 -.0019661 _cons | .9983143 .7852889 1.27 0.204 -.5408235 2.537452 ---

เช่น-กําหนด p-value for entry =.05; p-value for remove =.10

Stepwise logistic regression

-การใช้ Maximum Likelihood ในการวิเคราะห์แบบ stepwise logistic regression สําหรับข้อมูลจํานวนมาก

ใช้เวลามาก

-Applied จากค่าสถิติตัวแปรแต่ละตัวแปร เช่น SAS -ใช้ค่า Score test (Pe), Wald Test (Pr) STATA -ใช้ค่า Wald Test (Pe, Pr)

SPSS -ใช้ค่า Score test (Pe), LR Test (Pr)

Stepwise logistic regression กรณี

เปรียบเทียบ reduced model กับ full Model -reduced model ให้เหลือเฉพาะโมเดลทีมีนัยสําคัญ ่ -ยกเว้นกรณีตัวแปร discrete หรือตัวแปรที่ Height order

interaction มีนัยสําคัญ

-เปรียบเทียบ reduced model กับ full Model

-ถ้า likelihood ratio test (G ) ของ reduced model และ full model ไม่แตกต่างกัน แสดงว่า reduced model good as the full model นักศึกษาค้นคว้า

xi: logit low age lwt i.race ftv

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -111.41656 Iteration 2: log likelihood = -111.28677 Iteration 3: log likelihood = -111.28645

Logit estimates Number of obs = 189

LR chi2(5) = 12.10

Prob > chi2 = 0.0335

Log likelihood = -111.28645 Pseudo R2 = 0.0516

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857 lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251 _Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681 _Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085 ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733 _cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347 ---

. xi: logit low age lwt i.race ftv,or

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -111.41656 Iteration 2: log likelihood = -111.28677 Iteration 3: log likelihood = -111.28645

Logit estimates Number of obs = 189 LR chi2(5) = 12.10 Prob > chi2 = 0.0335

Log likelihood = -111.28645 Pseudo R2 = 0.0516

--- low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

---+--- age | .9764586 .0329355 -0.71 0.480 .9139936 1.043193 lwt | .9858564 .0064482 -2.18 0.029 .9732989 .9985759 _Irace_2 | 2.728898 1.358603 2.02 0.044 1.028513 7.240436 _Irace_3 | 1.542043 .5585894 1.20 0.232 .7581543 3.13643 ftv | .9518876 .1591923 -0.29 0.768 .6858544 1.321111 ---

Assessing The fit of The Model

1.computation and evaluation of overall measures of fit - Pearson Chi-Square

- Hosmer-Lameshow Test - Classification Table

- Area Under the Receiver Operating Characteristic Curve (ROC)

- Examination of others measure (R 2 )

3. Logistic Regression Diagnostics

4. Assessment of fit via External validation

(5)

5

. quietly xi: sw logit low age lwt i.race ftv, pr(.25) pe(.20) forward . lfit

Logistic model for low, goodness-of-fit test

number of observations = 189 number of covariate patterns = 109 Pearson chi2(105) = 111.22

Prob > chi2 = 0.3204

  

n

i i ( i ) i ) (y i χ

Pearson

1 ˆ 1 ˆ ˆ 2

2

-computation and evaluation of overall measures of fit Pearson Chi-Square

j M

j j j j

j j j

Pearson

if x x

m m

y

  

) ; 1 ˆ ˆ (

ˆ ) (

1

2 2

 

df = j-p-1

j=number of covariance patterns; p=parameter

. do "G:\cat2011\pearson_chisquare.do"

. clear

. input id y x1 id y x1

1. 1 1 5

2. 2 1 7

3. 3 1 9

4. 4 1 11

5. 5 1 11

6. 6 0 2

7. 7 0 2

8. 8 0 4

9. 9 0 6

10. 10 0 8

11. end . logit y x1 ,nolog

Logistic regression Number of obs = 10 LR chi2(1) = 5.37 Prob > chi2 = 0.0205 Log likelihood = -4.2462367 Pseudo R2 = 0.3874

--- y | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- x1 | .6470744 .3861188 1.68 0.094 -.1097046 1.403853 _cons | -4.205984 2.65246 -1.59 0.113 -9.40471 .992742 ---

. lfit

Logistic model for y, goodness-of-fit test number of observations = 10 number of covariate patterns = 8

Pearson chi2(6) = 7.34 Prob > chi2 = 0.2905 . predict phat

(option pr assumed; Pr(y)) . list y x1 phat

+---+

| y x1 phat |

|---|

1. | 1 5 .2747586 | 2. | 1 7 .5801861 | 3. | 1 9 .8344758 | 4. | 1 11 .9484284 | 5. | 1 11 .9484284 | 6. | 0 2 .0515716 | 7. | 0 2 .0515716 | 8. | 0 4 .1655242 | 9. | 0 6 .4198139 | 10. | 0 8 .7252414 | +---+

. gen r2=( (y-phat)/(sqrt(phat*(1-phat))))^2 . qui sum r2 ,de

. di "Pearson Chi-Square =" r(sum) Pearson Chi-Square =7.3405042

Computation and evaluation of overall measures of fit -Hosmer-Lameshow Test:

 

 

   

 

k k

c

j k

j j k

th k

c j

i

k k

th k

k k

n y m probabilit estimated

average

decile k in patterns ariate of

number the c

y patterns ariate

c the among s coresponse of

number the o

group k in subjects of number total n

g

k n k k

k ) n k - C (o

1 1

ˆ cov

cov

1 ( 1 )

2 ˆ

 

well fit el H mod

0

:

. lfit,group(3) table

Logistic model for y, goodness-of-fit test

(Table collapsed on quantiles of estimated probabilities) +---+

| Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |

|---+---+---+---+---+---+---|

| 1 | 0.2748 | 1 | 0.5 | 3 | 3.5 | 4 |

| 2 | 0.7252 | 1 | 1.7 | 2 | 1.3 | 3 |

| 3 | 0.9484 | 3 | 2.7 | 0 | 0.3 | 3 | +---+

number of observations = 10 number of groups = 3 Hosmer-Lemeshow chi2(1) = 1.46

Prob > chi2 = 0.2275 . sort phat

. list phat y phat_gr +---+

| phat y phat_gr |

|---|

1. | .0515716 0 1 | 2. | .0515716 0 1 | 3. | .1655242 0 1 | 4. | .2747586 1 1 | 5. | .4198139 0 2 | 6. | .5801861 1 2 | 7. | .7252414 0 2 | 8. | .8344758 1 3 | 9. | .9484284 1 3 | 10.| .9484284 1 3 | +---+

   

 

ck

j k

j j k

k k

n m g

k n k k

k ) n k - C (o

1

ˆ

1 ( 1 )

2 ˆ

 

.1358565

4

) 2747586 . 1655242 . ) 0515716 .

* 2 ) ( 4

1

(

x

3.456574 543426 4 .543426

4

) 2747586 . 1655242 . ) 0515716 .

* 2 ) ( 4 (

1 0

 

n

x n

Computation and evaluation of overall measures of fit -หรือ Hosmer-Lameshow Test

H* แบ่งข้อมูลเป็น 10 ส่วนเท่าๆ กัน (ตามความน่าจะเป็น)

2

; 1

2

 

 

gdf g

k e k

k ) e - o

* (

H

k

 

e k ความน่าจะเป็นในการเกิดเหตุการณ์ในแต่ละกลุ่มตัวแปรตาม (1,0) และตามการแบ่งความน่าจะเป็นในการเกิดเหตุการณ์

(phat)

 

o k จํานวนค่าสังเกตในแต่ละกลุ่มตัวแปรตาม(1,0) และ

ตามการแบ่งความน่าจะเป็น (phat)

(6)

6

Hosmer-Lameshow Test /Ho : สมการเหมาะสม

-H* แบ่งข้อมูลเป็น 10 ส่วนเท่าๆ กัน (ตามความน่าจะเป็น)

. quietly xi: sw logit low age lwt i.race ftv, pr(.25) pe(.20) forward . lfit, group(10) table

Logistic model for low, goodness-of-fit test

(Table collapsed on quantiles of estimated probabilities)

_Group _Prob _Obs_1 _Exp_1 _Obs_0 _Exp_0 _Total

1 0.1681 2 2.4 17 16.6 19

2 0.2228 4 4.2 17 16.8 21

3 0.2531 5 4.0 12 13.0 17

4 0.2708 4 5.0 15 14.0 19

5 0.2955 8 5.4 11 13.6 19

6 0.3334 6 6.1 13 12.9 19

7 0.3681 6 8.2 17 14.8 23

8 0.4078 3 5.8 12 9.2 15

9 0.4770 12 8.9 8 11.1 20

10 0.5975 9 8.9 8 8.1 17

number of observations = 189 number of groups = 10 Hosmer-Lemeshow chi2(8) = 7.61

Prob > chi2 = 0.4728

 

g

k e k

k ) e - o ( H *

k

1 2

. use "H:\516701_2556\lowbwt_update.dta", clear . xi: logit low lwt i.race smoke

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted) ...

Logistic regression Number of obs = 189 LR chi2(4) = 19.66 Prob > chi2 = 0.0006 Log likelihood = -107.50733 Pseudo R2 = 0.0838

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0132595 .0063102 -2.10 0.036 -.0256272 -.0008917 _Irace_2 | 1.290094 .5108751 2.53 0.012 .2887976 2.291391 _Irace_3 | .9705149 .412235 2.35 0.019 .1625492 1.778481 smoke | 1.060006 .378323 2.80 0.005 .3185065 1.801505 _cons | -.1092208 .8821091 -0.12 0.901 -1.838123 1.619681 ---

ตัวอย่างการคํานวณ Hosmer-Lameshow test (วิเคราะห์เฉพาะ ตัวแปร lwt, race, smoke)

. lfit, group(10) table

Logistic model for low, goodness-of-fit test

(Table collapsed on quantiles of estimated probabilities) +---+

| Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |

|---+---+---+---+---+---+---|

| 1 | 0.1229 | 0 | 1.8 | 19 | 17.2 | 19 |

| 2 | 0.1579 | 2 | 2.7 | 17 | 16.3 | 19 |

| 3 | 0.2258 | 5 | 3.5 | 14 | 15.5 | 19 |

| 4 | 0.2934 | 7 | 5.1 | 12 | 13.9 | 19 |

| 5 | 0.3252 | 6 | 6.9 | 16 | 15.1 | 22 |

|---+---+---+---+---+---+---|

| 6 | 0.3452 | 5 | 5.4 | 11 | 10.6 | 16 |

| 7 | 0.3757 | 10 | 7.3 | 10 | 12.7 | 20 |

| 8 | 0.4017 | 8 | 7.0 | 10 | 11.0 | 18 |

| 9 | 0.4704 | 7 | 8.2 | 12 | 10.8 | 19 |

| 10 | 0.7028 | 9 | 11.1 | 9 | 6.9 | 18 | +---+

number of observations = 189 number of groups = 10 Hosmer-Lemeshow chi2(8) = 7.35

Prob > chi2 = 0.4996

  

g

k e k

k ) e - o ( H *

k

1 2

. predict phat . sort phat

. xtile phat_gr = phat, nq(10) . tab phat_gr low

10 |

quantiles | Low Birth Weight

of phat | 0 1 | Total ---+---+--- 1 | 19 0 | 19 2 | 17 2 | 19 3 | 14 5 | 19 4 | 12 7 | 19 5 | 16 6 | 22 6 | 11 5 | 16 7 | 10 10 | 20 8 | 10 8 | 18 9 | 12 7 | 19 10 | 9 9 | 18 ---+---+--- Total | 130 59 | 189

  

g

k e k

k ) e - o ( H *

k

1 2

+---+

| Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |

|---+---+---+---+---+---+---|

| 1 | 0.1229 | 0 | 1.8 | 19 | 17.2 | 19 |

| 2 | 0.1579 | 2 | 2.7 | 17 | 16.3 | 19 | . . .

| 9 | 0.4704 | 7 | 8.2 | 12 | 10.8 | 19 |

| 10 | 0.7028 | 9 | 11.1 | 9 | 6.9 | 18 | +---+

.sort phat

.list phat low phat_gr if phat_gr==1 +---+

| phat low phat_gr |

|---|

1. | .0579963 0 1 |

2. | .0673255 0 1 |

3. | .0681629 0 1 |

4. | .0707333 0 1 |

5. | .0809414 0 1 |

|---|

6. | .0860122 0 1 |

7. | .0860122 0 1 |

8. | .0870603 0 1 |

9. | .0891913 0 1 |

10. | .0970244 0 1 |

|---|

11. | .0993727 0 1 |

12. | .1029206 0 1 |

13. | .1029899 0 1 |

14. | .1042213 0 1 |

15. | .1092779 0 1 |

|---|

16. | .1176729 0 1 |

17. | .1214464 0 1 |

18. | .1228683 0 1 |

19. | .1228683 0 1 |

+---+

. predict phat . sort phat

. xtile phat_gr = phat, nq(10) . tab phat_gr low

. qui su phat if phat_gr==1 . local q1=r(sum) . local n1=r(N) . di "Exp_1 11 = " `q1' Exp_1 = 1.7940981

. di "Exp_0 10 = " `n1'-`q1' Exp_0 = 17.205902

. . .

. qui su phat if phat_gr==10 . local q10=r(sum) . local n10=r(N) . di "Exp_1 101 = " `q10' Exp_1 101 = 11.148085 . di "Exp_0 100 = " `n10'-`q10' Exp_0 100 = 6.8519148

19-1.7941

+---+

| Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |

|---+---+---+---+---+---+---|

| 1 | 0.1229 | 0 | 1.8 | 19 | 17.2 | 19 |

| 2 | 0.1579 | 2 | 2.7 | 17 | 16.3 | 19 | . . .

| 9 | 0.4704 | 7 | 8.2 | 12 | 10.8 | 19 |

| 10 | 0.7028 | 9 | 11.1 | 9 | 6.9 | 18 | +---+

2 ...

. 17 ) 2 . 17 19 ( 8 . 1 ) 8 . 1 0 ( 1

2 2 2

 

 

 

 

g

k ek

k) e - o

* (

H k

Hosmer-Lameshow Test /Ho : สมการเหมาะสม

. quietly xi: sw logit low age lwt i.race ftv, pr(.25) pe(.20) forward . lfit, group(10) table

Logistic model for low, goodness-of-fit test

(Table collapsed on quantiles of estimated probabilities)

_Group _Prob _Obs_1 _Exp_1 _Obs_0 _Exp_0 _Total

1 0.1681 2 2.4 17 16.6 19

2 0.2228 4 4.2 17 16.8 21

3 0.2531 5 4.0 12 13.0 17

4 0.2708 4 5.0 15 14.0 19

5 0.2955 8 5.4 11 13.6 19

6 0.3334 6 6.1 13 12.9 19

7 0.3681 6 8.2 17 14.8 23

8 0.4078 3 5.8 12 9.2 15

9 0.4770 12 8.9 8 11.1 20

10 0.5975 9 8.9 8 8.1 17

number of observations = 189 number of groups = 10 Hosmer-Lemeshow chi2(8) = 7.61

Prob > chi2 = 0.4728

(7)

7

Classification Tables

. quietly xi: sw logit low age lwt i.race ftv, pr(.25) pe(.20) forward . lstat

Logistic model for low

--- True ---

Classified | D ~D | Total ---+---+---

+ | 6 6 | 12

- | 53 124 | 177

---+---+---

Total | 59 130 | 189

Classified + if predicted Pr(D) >= .5 True D defined as low ~= 0

--- Sensitivity Pr( +| D) 10.17%

Specificity Pr( -|~D) 95.38%

Positive predictive value Pr( D| +) 50.00%

Negative predictive value Pr(~D| -) 70.06%

--- False + rate for true ~D Pr( +|~D) 4.62%

False - rate for true D Pr( -| D) 89.83%

False + rate for classified + Pr(~D| +) 50.00%

False - rate for classified - Pr( D| -) 29.94%

--- Correctly classified 68.78%

---

. lroc

Logistic model for low

number of observations = 189 area under ROC curve = 0.6473

Area Under the Receiver Operating Characteristic Curve (ROC)

Rule area under the ROC Curve

ROC = 0.5 no discrimination, so we might as well flip a coin 0.5 < ROC < 0.7 poor discrimination, not much

better than a coin toss 0.7  ROC < 0.8 acceptable discrimination 0.8  ROC < 0.9 excellent discrimination ROC  0.9 outstanding discrimination

- In practice it is extremely unusual to observe areas under the ROC curve greater than 0.90

- Complete separation would be required for the areas under the ROC curve more than 0.90

- When there is complete separation it is impossible to estimate coefficients of a logistic regression model

Other Summary Measure -Measures R 2

-McFadden’s Pseudo R 2 ,Efron’s Pseudo R 2 etc.

n

i i n i

i i ef

y y y R

R Pseudo s Efron

1 2 1

2

2 2

) (

ˆ ) ( 1 '

0 2

2

1

' L

R L R Pseudo s

McFadden

mf

 

p

L 0 = log likelihood for models containing only the intercept L p = log likelihood for models containing only the intercept

plus the p covariate

Nagelkerke’s R 2 (Cragg & Uhler R 2 )

ll 0 is the log likelihood of the model without regressors ll 1 the log likelihood of the full model

Likelihood ratio test n is the sample size

n 1 , n0 number of response variable (yi) are either 1 or 0 p i is probailities that predicted from logit model Hosmer & Lemeshow,(2000 p 167) - Do not Recommend routine publishing of R 2 - However ,may be helpful in model building stage.

n ll

n LR

e R e

(2 )/

) / ( 2

1

0

s 1

Nagelkerke

 

n n n n n n

ll

0

1

ln

1

0

ln

0

 ln

ˆ ) 1 ln(

) 1 ˆ (

1

y

i

ln p

i

y

i

p

i

ll    

) ( 2 ll

0

ll

1

LR   

. xi: logit low lwt i.race

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -111.7491 Iteration 2: log likelihood = -111.62983 Iteration 3: log likelihood = -111.62955

Logistic regression Number of obs = 189 LR chi2(3) = 11.41 Prob > chi2 = 0.0097 Log likelihood = -111.62955 Pseudo R2 = 0.0486

--- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0152231 .0064393 -2.36 0.018 -.0278439 -.0026023 _Irace_2 | 1.081066 .4880512 2.22 0.027 .1245034 2.037629 _Irace_3 | .4806033 .3566733 1.35 0.178 -.2184636 1.17967 _cons | .8057535 .8451625 0.95 0.340 -.8507345 2.462241 ---

. di 1-((-111.62955)/(-117.336)) .04863341

0 2

2

1

' L

R L R Pseudo s

McFadden

mf

 

p

(8)

8

. xi: sw logit low age lwt i.race ftv, pr(.25) pe(.20) forward

Logistic regression Number of obs = 189 LR chi2(3) = 11.41 Prob > chi2 = 0.0097 Log likelihood = -111.62955 Pseudo R2 = 0.0486 --- low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+--- lwt | -.0152231 .0064393 -2.36 0.018 -.0278439 -.0026023 _Irace_2 | 1.081066 .4880512 2.22 0.027 .1245034 2.037629 _Irace_3 | .4806033 .3566733 1.35 0.178 -.2184636 1.17967 _cons | .8057535 .8451625 0.95 0.340 -.8507345 2.462241 ---

. fitstat

Measures of Fit for logit of low

Log-Lik Intercept Only: -117.336 Log-Lik Full Model: -111.630 D(185): 223.259 LR(3): 11.413 Prob > LR: 0.010 McFadden's R2: 0.049 McFadden's Adj R2: 0.015 Maximum Likelihood R2: 0.059 Cragg & Uhler's R2: 0.082 McKelvey and Zavoina's R2: 0.092 Efron's R2: 0.058 Variance of y*: 3.622 Variance of error: 3.290 Count R2: 0.688 Adj Count R2: 0.000 AIC: 1.224 AIC*n: 231.259 BIC: -746.464 BIC': 4.312

Logistic Regression Diagnostics

การวินิจฉัยเป็นวิธีตรวจสอบรายข้อมูล โดยมีแนวคิดจาก -ค่าส่วนที่เหลือ (residual) ได้แก่ ค่า ,

-ค่าผลกระทบ (influence) ได้แก่

2

X

i

ii

ˆ

D

i

i

i i

h

X r

 

 1

2 2

ˆ ) 1 ˆ (

) ˆ (

i i i

i i i

i

m

m residual y

Pearson

r  

 

' '

'

( )

;

i i i

i i

i

diagonal of matrix H v b b x X VX x

h   

i i i i i i

i

h

d h h d r

D  

 

 1 1

2 2 2

2 2

) 1 ( ˆ

i i i

i

h

h r

 

 

i i i i

i i i i

m y m

y m

residual deviance d

; ˆ ) ln(

2

0

; ˆ ) 1 ln(

2

   

2

2 ' '

) 1 ( ˆ ) ˆ ˆ ( ˆ ˆ

i i i j j

i

h

h VX r

X   

   

 

Logistic Regression Diagnostics - Plot

- Plot - Plot Other plots

- Plot - Plot - Plot

i i

versus

X

2

ˆ

i i

versus

D ˆ

i i

versus

 ˆ ˆ

i i

versus h X

2

i i

versus h

D

i i

versus h

ˆ

i

i

D

X

 ,

2

ii

ˆ

= upper 95 th Percentile crude approximation 4

= influence diagnostic must be larger than 1

84 . 3 ) 1 ( 4 ,

02.05

2

   

X

i

D

i

.use "G:\hosmer_data\logistic\uis.dta", clear .gen ndrgfp1 = ((ndrugtx+1)/10)^(-1)

.gen ndrgfp2 = ndrgfp1*log((ndrugtx+1)/10) .gen agendrgfp1 = age*ndrgfp1

.gen racesite = race*site

Logistic Regression Diagnostics (1) - Plot  X

i2

versus ˆ

i

.xi:logit dfree age ndrgfp1 ndrgfp2 i.ivhx race treat site agendrgfp1 racesite

.predict p .predict dx, dx2

.graph twoway scatter dx p, xlabel(0(.2)1) ylabel(0(10)30)

(9)

9

xi: sw logit low age lwt i.race ftv, pr(.25) pe(.20) forward predict p

predict dx, dx2

graph twoway scatter dx p, xlabel(0(.2)1) ylabel(0(10)30) ///

title(FIgure I Plot Delta X^2 Versus Phat)

0102030H-L dX^2

0 .2 .4 .6 .8 1

Pr(low)

Fig. I Plot Delta X^2 Versus Phat

Logistic Regression Diagnostics (2) - Plot  D

i

versus ˆ

i

.xi:logit dfree age ndrgfp1 ndrgfp2 i.ivhx race treat site agendrgfp1 racesite

.predict p .predict dd, dd

.graph twoway scatter dd p, xlabel

(0(.2)1)

ylabel(0 3.5 7)

xi: sw logit low age lwt i.race ftv, pr(.25) pe(.20) forward predict p

predict dd, dd

graph twoway scatter dd p, xlabel(0(.2)1) ylabel(0 3.5 7) ///

title(Fig. II Plot Delta Di Versus Phat)

03.57H-L dD

0 .2 .4 .6 .8 1

Pr(low)

Fig. II Plot Delta Di Versus Phat

Logistic Regression Diagnostics (3) - Plot   ˆ

i

versus  ˆ

i

.xi:logit dfree age ndrgfp1 ndrgfp2 i.ivhx race treat site agendrgfp1 racesite

.predict p .predict db, db

.graph twoway scatter db p, xlabel(0(.2)1)ylabel(0.15 .3)

xi: sw logit low age lwt i.race ftv, pr(.25) pe(.20) forward predict p

predict db, db

graph twoway scatter db p, xlabel(0(.2)1) ylabel(0.15 .3) ///

title(Fig. III Plot Delta Beta Versus Phat)

.15.3Pregibon's dbeta

0 .2 .4 .6 .8 1

Pr(low)

Fig. III Plot Delta Beta Versus Phat

References

Related documents