Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

(1)

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

libname in1 >c:\=;

Data first; Set in1.extract;

A=1;

PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA;

OUTPUT OUT=CC XBETA=XB P=PROB;

MODEL EDUC=POVDUM

; WEIGHT WEIGHT;

*EDUC IS A 4 LEVEL ORDERED VARIABLE FOR LEVEL OF EDUCATION. Each of the categories is mutually exclusive. P=prob will give the probability estimate for the likelihood of

reaching particular levels of education. For each of the observations, SAS will create 3 observations BB a different probability estimate for each of these levels. One of the levels is an excluded category, and we can determine the likelihood of that event by subtraction.

Data F; set DD;

Rename pov=cpov;

drop _type_;

A=1;

Data G; Merge F CC; by A;

Xb_npov=xb-cpov*pov;

Xb_pov=xb_npov+cpov;

PR_NPOV=(EXP(XB_NPOV))/(1+EXP(XB_NPOV));

PR_POV=(EXP(XB_POV))/(1+EXP(XB_POV));

DATA F;RETAIN _LEVEL_;SET G;

PROC SORT;BY _LEVEL_;

PROC MEANS;VAR PROB _LEVEL_ PR_NPOV PR_POV;

BY _LEVEL_ ; run;

There are 3 different levels that SAS will determine probability

estimates for BB one for each of the intercept values. What we need to do is simply run a proc means by the particular level to determine the

probability estimates for each level. In other words, SAS is creating a probability estimate for 3 of the levels (out of 4) and will give the probability of being in the particular level for each individual. Thus for person 1 (or case 1), SAS creates 3 observations for this case, with probability estimates for each case by the level or category of

(2)

education. Person 1 will have 3 separate observations with a newly created variable name _level_ indicating which level the probability estimate is for. To determine the probability estimate for level 1, we need to only examine those cases where the probability estimate is for level 1. What I==ve done above is determined mean values (by using proc means) by the particular _level_, which will give separate mean values for the different levels. Level 1 is the excluded category from the analysis, so we will only get probabilities for levels 2, 3 and 4. The probability estimate for Level 2 gives the probability of being a college graduate or having some college or being a high school graduate. (If we had a level 1 probability estimate, it would merely tell us the

probability of being a college grad or having some college or graduating from high school or dropping out of high school. In other words, the value of this will always be 1.) For level 3, the probability estimates indicate the probability of some college or being a college graduate.

The probability estimates for level 4 indicate the likelihood of

graduating from college. Hence, the only probability we really know is the probability of graduating from college. We can then subtract the probability of graduating from college from the probability of either graduating from college or going to college to determine the probability of going to college. If we==d like to determine the probability of

graduating from high school, we could subtract the probability of graduating from college or going to college from the probability for level 2 (graduating from college, going to college or graduating from high school). To determine the probability of dropping out of high school, we could subtract the probability of level 2 (graduating from college, going to college or graduating from high school) from 1.

The reason for this difficulty in determining probability estimates is because the model is based on cumulative probabilities.

Note that the bottom category is being a college graduate. You must look at the order that SAS puts the different levels B or look to the ordered values in SAS. Here, ordered value=1 is Educ=4. Ordered value=2 is Educ=3, etc.

The interpretation of the intercepts are as follows:

Intercept1 log odds of being a college grad versus having some

college, being a high school grad or being a high school dropout. In other words, this is the log odds of being in the lowest ordered value category relative to all other categories.

Intercept2 log odds of being a college grad or having some college

(3)

relative to being a high school graduate or being a high school dropout. Or, the log odds of being in the bottom two ordered categories relative to being in the top two ordered categories.

Intercept3 log odds of being a college grad or having some college or being a high school graduate relative to being a high

school dropout. Or, the log odds of being in the bottom 3 ordered categories relative to being in the top ordered category.

For a further explanation of how to use ordered logistic regression, see Categorical Data Analysis Using the SAS System, pages 217-231, by Maura E. Stokes, Charles S. Davis and Gary G. Koch, from the SAS Institute, 1995.

Results

The LOGISTIC Procedure

Data Set: WORK.Z

Response Variable: EDUC Response Levels: 4

Number of Observations: 1884 Weight Variable: WEIGHT

Sum of Weights: 1884 Link Function: Logit

Response Profile

Ordered Total Value EDUC Count Weight

1 4 471 512.86406 2 3 671 674.85866 3 2 578 549.99787 4 1 164 146.27942

Since SAS puts these values in the Awrong@ order, I have reordered them with the sort command (above) and the data=order command (also above).

Score Test for the Proportional Odds Assumption

Chi-Square = 24.4340 with 2 DF (p=0.0001)

The chi-square test above indicates if we can assume that the b

coefficients have proportional effects on the different levels of the dependent variable. Since we would reject

(4)

the null hypothesis, reject the proportional effects assumption. Thus, we could run separate logistic regression models for each of level of the dependent variable.

Model Fitting Information and Testing Global Null Hypothesis BETA=0

Intercept Intercept and

Criterion Only Covariates Chi-Square for Covariates

AIC 4828.334 4654.716 . SC 4844.957 4676.881 .

-2 LOG L 4822.334 4646.716 175.618 with 1 DF (p=0.0001) Score . . 170.368 with 1 DF (p=0.0001)

-2 Log L tells us if the model is significant or not (much like the F value in OLS regression). The p value gives the exact level of

significance.

Analysis of Maximum Likelihood Estimates

Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio

INTERCP1 1 -0.7468 0.0548 186.0003 0.0001 . . INTERCP2 1 0.8646 0.0558 239.9376 0.0001 . . INTERCP3 1 2.9219 0.0967 913.3127 0.0001 . . POVDUM 1 -1.3713 0.1063 166.4219 0.0001 -0.314033 0.254

This indicates that those who grow up poor have less education than those who do not grow up poor. We determine probability estimates using these coefficient estimates. The probability estimates are given below.

Intercept1 tell us the log odds of being a college grad relative to those who are not college grads. Intercept2 indicates the log odds of being a college graduate or having some college relative to those who are high school graduates or high school dropouts. Intercept3 indicates the log odds of being a college grad, having some college or having a high school degree relative to being a high school dropout.

(5)

Probability Estimates

1. LIKELIHOOD OF COLLEGE GRAD, SOME COLLEGE OR HIGH SCHOOL GRADUATION.

Response Value=2

Variable Label N Mean Std Dev Minimum --- PROB Estimated Probability 1884 0.9214712 0.0514712 0.8250009 _LEVEL_ Response Value 1884 2.0000000 0 2.0000000 PR_NPOV 1884 0.9489188 0 0.9489188 PR_POV 1884 0.8250009 0 0.8250009 --- 2. LIKELIHOOD OF COLLEGE GRADUATION OR SOME COLLEGE

Response Value=3

Variable Label N Mean Std Dev Minimum --- PROB Estimated Probability 1884 0.6310367 0.1360967 0.3759564 _LEVEL_ Response Value 1884 3.0000000 0 3.0000000 PR_NPOV 1884 0.7036118 0 0.7036118 PR_POV 1884 0.3759564 0 0.3759564 ---

3. LIKELIHOOD OF COLLEGE GRADUATION

Response Value=4

Variable Label N Mean Std Dev Minimum --- PROB Estimated Probability 1884 0.2740716 0.0889561 0.1073450 _LEVEL_ Response Value 1884 4.0000000 0 4.0000000 PR_NPOV 1884 0.3215084 0 0.3215084 PR_POV 1884 0.1073450 0 0.1073450 --- From these probabilities, we know that the overall likelihood of graduating from college is .274 and we could also easily determine the probability of dropping out by subtracting .9214 from 1 (=.0786). The likelihood of going to college (but not graduating) =.631-.274 = .357. The likelihood of getting a high school degree = .921-.631 = .290. We could also determine these probability estimates for those who are in poverty during childhood and those who are not.