1. Choose Analyze on the menu bar 2. Choose Regression
3. Choose Binary Logistic...
4. Dependent: Select the dependent variable from the source list on the left and then click on the arrow next to the dependent variable box.
5. Covariate(s): Select the independent variable and then click on the arrow next to the Covariate(s) box. Repeat the process until you have selected all the independent variables you want.
6. Choose Enter as the Method. Enter is the default method for independent variable entry. Other methods of variable entry can be selected by clicking on the down arrow and clicking on the desired method of entry.
7. Choose OK
Additional options are available under >a*>b, Categorical..., Save..., Method, or Options... . For example:
>a*>b (for adding two-way interactions) You can add an interaction between two independent variables to the regression model by selecting two variables from the source list on the left (hold down the Ctrl key while selecting the two variables) and then clicking on >a*>b (after you highlight two variables from the source list on the left the >a*>b should be available to select). Categorical... You can use the categorical option to have SPSS create indicator or dummy variables for categorical variables.
1. Choose Categorical
2. Categorical Covariates: Select a covariate that is categorical and then click on the arrow next to the Covariates box.
3. Choose Indicator as the Contrast: Indicator is the default method for creating indicator variables. Other methods can be selected by clicking on the down arrow and clicking on the desired method.
4. Choose the reference category as the last category (i.e., the category with the largest numeric coding value) or the first the category (i.e., category with the smallest numeric coding value). 5. Choose Change.
6. Repeat steps 2 through 5 until you have defined all categorical variables. 7. Choose Continue.
Save...
Predicted Values (Probabilities and Group Membership). This options creates new variables that are the predicted probabilities and the predicted group membership. The predicted group membership (0 or 1) is based on the whether the predicted probability is less than (group membership=0) or greater than or equal to (group membership=1) the classification cutoff. By default the classification cutoff value is 0.5. You can change the cutoff value using Options... Residuals (Unstandardized, Logit, Studentized, Standardized, Deviance)
Note that SPSS creates a new variable for each selected Save... option and adds the new
variables to the data file. The variable names are defined in the Viewer window. Once you are done using these variables you may want to delete them from the data file or save them (be re- saving the data file).
Method… Click on the down arrow to the right of Method to display the methods available for independent variable entry (enter, forward:conditional, forward:LR, forward:Wald,
backward:conditional, backward:LR, backward:Wald). Options...
Confidence interval for odds ratio (CI for exp(B)) Hosmer-Lemeshow goodness-of-fit
You can modify the entry and removal criteria used by the backward and forward variable entry methods.
Previous, Block # of #, Next You can use these options to enter independent variables in blocks into the regression model. You can select different methods of variable entry for each block. Example. Logistic regression will be used to determine the relationship between any use of health services (coded 0 = no use, 1 = any use) and age, health index, gender and race. Subjects in the study (Model Cities Data Set) were followed for a varying amount of time, so the number of months followed (expos) will also be included as an independent variable in the logistic regression model.
The dependent variable, anyuse, is binary.
There are 5 independent variables. Female and Race are categorical/nominal variables.
Logistic Regression
Case Processing Summary
Unweighted Cases(a) N Percent
Selected Cases Included in Analysis 3199 73.1
Missing Cases 1175 26.9
Total 4374 100.0
Unselected Cases 0 .0
Total 4374 100.0
a If weight is in effect, see classification table for the total number of cases. Dependent Variable Encoding
Original Value Internal Value
.00 0
1.00 1
You can use the Categorical option to define which variables are categorical and SPSS will create the indicator variables.
By default the category with the largest numerical value (last) will be the reference group. Here, the category with the smallest numerical value was selected as the reference group.
Under Options you can select to have the 95% confidence intervals for the odds ratios displayed in the output. Also, you can run the Hosmer- Lemeshow goodness-of-fit test.
Information on the number of observations used in the logistic
regression. Subjects with missing data are excluded.
SPSS will always recode the dependent variable to a 0 or 1 binary variable (internal value), and will estimate the odds ratio for the event coded as 1 (vs the event coded as 0). If your dependent variable is not coded 0 or 1, check this table to determine the interpretation of the odds ratios.
Categorical Variables Codings Frequency Parameter coding (1) (2) race white 497 .000 .000 other 455 1.000 .000 black 2247 .000 1.000 female male 1450 .000 female 1749 1.000
Caution! – Make sure you understand the interpretation of the indicator variables that SPSS creates. It is very easy to get confused. For example, in this example the variable race is coded 1=white, 2=other, 3=black. A common mistake would be to interpret race(1) = white and race(2) = other.
Block 0: Beginning Block
Ignore all the output under Block 0. The outputdisplays information for the logistic regression model with no independent variables in the model.
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
Chi-square df Sig. Step 1 Step 301.534 6 .000 Block 301.534 6 .000 Model 301.534 6 .000 Model Summary Step -2 Log likelihood
Cox & Snell R Square
Nagelkerke R Square
1 2609.415(a) .090 .151
a Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.
Classification Table(a) Predicted anyuse percent Observed .00 1.00 correct Step 1 anyuse .00 0 542 .0 1.00 0 2657 100.0 Overall percentage 83.1
a The cut value is .500
This table gives the definition of the indicator variables. E.g.,
race(1) = other race(2) = black
(race = white, is the reference group) female(1) = female
(male is the reference group)
Unless you are using stepwise methods to enter variables or entering variables in different blocks you can ignore this output.
“R-square” measures for logistic regression – usually not very useful.
Ignore this table also. It is describing how the logistic regression predicts any use if a predicted probability > 0.5 is to used to indicate any use. All subjects are predicted to have use.
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 8.368 8 .398
Contingency Table for Hosmer and Lemeshow Test
anyuse = .00 anyuse = 1.00 Total
Observed Expected Observed Expected Observed
Step 1 1 124 123.653 197 197.347 321 2 101 97.310 218 221.690 319 3 79 81.589 241 238.411 320 4 73 67.769 248 253.231 321 5 57 54.600 263 265.400 320 6 33 41.820 287 278.180 320 7 32 29.724 288 290.276 320 8 16 21.258 304 298.742 320 9 13 15.538 307 304.462 320 10 14 8.740 304 309.260 318
and expected values can be used to help identify where there is lack-of-fit when present. The last table of the output usually has the results we are most interested in. It lists the odds ratios, p-values and 95% confidence intervals for the odds ratios.
Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95.0% C.I.for EXP(B)
Lower Upper Step 1(a) expos .077 .006 167.398 1 .000 1.080 1.068 1.093 age .009 .003 8.118 1 .004 1.009 1.003 1.016 female(1) .501 .099 25.363 1 .000 1.650 1.358 2.005 race 12.715 2 .002 race(1) -.424 .190 4.964 1 .026 .655 .451 .950 race(2) -.530 .149 12.689 1 .000 .588 .440 .788 health .048 .010 23.603 1 .000 1.049 1.029 1.070 Constant -.337 .196 2.958 1 .085 .714
a Variable(s) entered on step 1: expos, age, female, race, health.
Exp(B) = Odds Ratio
95.0% C.I. for EXP(B) = 95% confidence interval for the odds ratio Sig. = P-value for the individual odds ratio or the overall significant of a categorical/nominal variable if there is no Exp(B) listed.
Hosmer-Lemeshow goodness-of-fit statistic is formed by grouping the data into g groups (usually
g=10) based on the percentiles of the estimated probabilities and calculating the Pearson chi-square statistic from the 2 x g table of observed and estimated expected frequencies. A small p- value indicates a lack of fit. Large differences between the observed
B = the logistic regression coefficient, the log odds ratio
S.E. = the standard error the of the logistic regression coefficient
Wald = the Wald test statistic for testing if B=0 (or equivalently odds ratio = 1) or if all B’s = 0 for a categorical variable with >2 indicator variables. d.f. = degrees of freedom of the test statistic.
It is often helpful to write on your output the definition of the indicator variables, so you don’t get confused about the interpretation of the results. Also, helpful to change Exp(B) to odds ratio, and sig. to P-value.
Odds
95.0% C.I.for odds ratio
Ratio Lower Upper P-value
Step 1(a) expos 1.080 1.068 1.093 .000 age 1.009 1.003 1.016 .004 female (vs male) 1.650 1.358 2.005 .000 race .002 other vs white .655 .451 .950 .026 black vs white .588 .440 .788 .000 health 1.049 1.029 1.070 .000