Logistic Regression

1. Choose Analyze on the menu bar 2. Choose Regression

3. Choose Binary Logistic...

4. Dependent: Select the dependent variable from the source list on the left and then click on the arrow next to the dependent variable box.

5. Covariate(s): Select the independent variable and then click on the arrow next to the Covariate(s) box. Repeat the process until you have selected all the independent variables you want.

6. Choose Enter as the Method. Enter is the default method for independent variable entry. Other methods of variable entry can be selected by clicking on the down arrow and clicking on the desired method of entry.

7. Choose OK

Additional options are available under >a*>b, Categorical..., Save..., Method, or Options... . For example:

>a*>b (for adding two-way interactions) You can add an interaction between two independent variables to the regression model by selecting two variables from the source list on the left (hold down the Ctrl key while selecting the two variables) and then clicking on >a*>b (after you highlight two variables from the source list on the left the >a*>b should be available to select). Categorical... You can use the categorical option to have SPSS create indicator or dummy variables for categorical variables.

1. Choose Categorical

2. Categorical Covariates: Select a covariate that is categorical and then click on the arrow next to the Covariates box.

3. Choose Indicator as the Contrast: Indicator is the default method for creating indicator variables. Other methods can be selected by clicking on the down arrow and clicking on the desired method.

4. Choose the reference category as the last category (i.e., the category with the largest numeric coding value) or the first the category (i.e., category with the smallest numeric coding value). 5. Choose Change.

6. Repeat steps 2 through 5 until you have defined all categorical variables. 7. Choose Continue.

Save...

 Predicted Values (Probabilities and Group Membership). This options creates new variables that are the predicted probabilities and the predicted group membership. The predicted group membership (0 or 1) is based on the whether the predicted probability is less than (group membership=0) or greater than or equal to (group membership=1) the classification cutoff. By default the classification cutoff value is 0.5. You can change the cutoff value using Options...  Residuals (Unstandardized, Logit, Studentized, Standardized, Deviance)

Note that SPSS creates a new variable for each selected Save... option and adds the new

variables to the data file. The variable names are defined in the Viewer window. Once you are done using these variables you may want to delete them from the data file or save them (be re- saving the data file).

Method… Click on the down arrow to the right of Method to display the methods available for independent variable entry (enter, forward:conditional, forward:LR, forward:Wald,

backward:conditional, backward:LR, backward:Wald). Options...

 Confidence interval for odds ratio (CI for exp(B))  Hosmer-Lemeshow goodness-of-fit

 You can modify the entry and removal criteria used by the backward and forward variable entry methods.

Previous, Block # of #, Next You can use these options to enter independent variables in blocks into the regression model. You can select different methods of variable entry for each block. Example. Logistic regression will be used to determine the relationship between any use of health services (coded 0 = no use, 1 = any use) and age, health index, gender and race. Subjects in the study (Model Cities Data Set) were followed for a varying amount of time, so the number of months followed (expos) will also be included as an independent variable in the logistic regression model.

The dependent variable, anyuse, is binary.

There are 5 independent variables. Female and Race are categorical/nominal variables.

Case Processing Summary

Unweighted Cases(a) N Percent

Selected Cases Included in Analysis 3199 73.1

Missing Cases 1175 26.9

Total ₄₃₇₄ _100.0

Unselected Cases 0 .0

Total ₄₃₇₄ _100.0

a If weight is in effect, see classification table for the total number of cases. Dependent Variable Encoding

Original Value Internal Value

.00 0

1.00 ₁

You can use the Categorical option to define which variables are categorical and SPSS will create the indicator variables.

By default the category with the largest numerical value (last) will be the reference group. Here, the category with the smallest numerical value was selected as the reference group.

Under Options you can select to have the 95% confidence intervals for the odds ratios displayed in the output. Also, you can run the Hosmer- Lemeshow goodness-of-fit test.

Information on the number of observations used in the logistic

regression. Subjects with missing data are excluded.

SPSS will always recode the dependent variable to a 0 or 1 binary variable (internal value), and will estimate the odds ratio for the event coded as 1 (vs the event coded as 0). If your dependent variable is not coded 0 or 1, check this table to determine the interpretation of the odds ratios.

Categorical Variables Codings Frequency Parameter coding (1) (2) race white 497 .000 .000 other ₄₅₅ _1.000 _.000 black 2247 .000 1.000 female male ₁₄₅₀ _.000 female 1749 1.000

Caution! – Make sure you understand the interpretation of the indicator variables that SPSS creates. It is very easy to get confused. For example, in this example the variable race is coded 1=white, 2=other, 3=black. A common mistake would be to interpret race(1) = white and race(2) = other.

Block 0: Beginning Block

Ignore all the output under Block 0. The output

displays information for the logistic regression model with no independent variables in the model.

Block 1: Method = Enter

Omnibus Tests of Model Coefficients

Chi-square df Sig. Step 1 Step 301.534 6 .000 Block _301.534 ₆ _.000 Model 301.534 6 .000 Model Summary Step -2 Log likelihood

Cox & Snell R Square

Nagelkerke R Square

1 _2609.415(a) _.090 _.151

a Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.

Classification Table(a) Predicted anyuse percent Observed .00 1.00 correct Step 1 _anyuse .00 0 542 .0 1.00 0 2657 100.0 Overall percentage 83.1

a The cut value is .500

This table gives the definition of the indicator variables. E.g.,

race(1) = other race(2) = black

(race = white, is the reference group) female(1) = female

(male is the reference group)

Unless you are using stepwise methods to enter variables or entering variables in different blocks you can ignore this output.

“R-square” measures for logistic regression – usually not very useful.

Ignore this table also. It is describing how the logistic regression predicts any use if a predicted probability > 0.5 is to used to indicate any use. All subjects are predicted to have use.

Hosmer and Lemeshow Test

Step Chi-square df Sig.

1 _8.368 ₈ _.398

Contingency Table for Hosmer and Lemeshow Test

anyuse = .00 anyuse = 1.00 Total

Observed Expected Observed Expected Observed

Step 1 1 ₁₂₄ _123.653 ₁₉₇ _197.347 ₃₂₁ 2 ₁₀₁ _97.310 ₂₁₈ _221.690 ₃₁₉ 3 ₇₉ _81.589 ₂₄₁ _238.411 ₃₂₀ 4 ₇₃ _67.769 ₂₄₈ _253.231 ₃₂₁ 5 ₅₇ _54.600 ₂₆₃ _265.400 ₃₂₀ 6 ₃₃ _41.820 ₂₈₇ _278.180 ₃₂₀ 7 ₃₂ _29.724 ₂₈₈ _290.276 ₃₂₀ 8 ₁₆ _21.258 ₃₀₄ _298.742 ₃₂₀ 9 ₁₃ _15.538 ₃₀₇ _304.462 ₃₂₀ 10 ₁₄ _8.740 ₃₀₄ _309.260 ₃₁₈

and expected values can be used to help identify where there is lack-of-fit when present. The last table of the output usually has the results we are most interested in. It lists the odds ratios, p-values and 95% confidence intervals for the odds ratios.

Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95.0% C.I.for EXP(B)

Lower Upper Step 1(a) expos .077 .006 167.398 1 .000 1.080 1.068 1.093 age .009 .003 8.118 1 .004 1.009 1.003 1.016 female(1) _.501 _.099 _25.363 ₁ _.000 _1.650 _1.358 _2.005 race 12.715 2 .002 race(1) _-.424 _.190 _4.964 ₁ _.026 _.655 _.451 _.950 race(2) -.530 .149 12.689 1 .000 .588 .440 .788 health _.048 _.010 _23.603 ₁ _.000 _1.049 _1.029 _1.070 Constant -.337 .196 2.958 1 .085 .714

a Variable(s) entered on step 1: expos, age, female, race, health.

Exp(B) = Odds Ratio

95.0% C.I. for EXP(B) = 95% confidence interval for the odds ratio Sig. = P-value for the individual odds ratio or the overall significant of a categorical/nominal variable if there is no Exp(B) listed.

Hosmer-Lemeshow goodness-of-fit statistic is formed by grouping the data into g groups (usually

g=10) based on the percentiles of the estimated probabilities and calculating the Pearson chi-square statistic from the 2 x g table of observed and estimated expected frequencies. A small p- value indicates a lack of fit. Large differences between the observed

B = the logistic regression coefficient, the log odds ratio

S.E. = the standard error the of the logistic regression coefficient

Wald = the Wald test statistic for testing if B=0 (or equivalently odds ratio = 1) or if all B’s = 0 for a categorical variable with >2 indicator variables. d.f. = degrees of freedom of the test statistic.

It is often helpful to write on your output the definition of the indicator variables, so you don’t get confused about the interpretation of the results. Also, helpful to change Exp(B) to odds ratio, and sig. to P-value.

Odds

95.0% C.I.for odds ratio

Ratio Lower Upper P-value

Step 1(a) expos _1.080 _1.068 _1.093 _.000 age _1.009 _1.003 _1.016 _.004 female (vs male) 1.650 1.358 2.005 .000 race _.002 other vs white .655 .451 .950 .026 black vs white _.588 _.440 _.788 _.000 health 1.049 1.029 1.070 .000

In document INTRODUCTION TO SPSS FOR WINDOWS Version 19.0 (Page 83-88)