• No results found

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

N/A
N/A
Protected

Academic year: 2021

Share "Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

Overview Classes

12-3 Logistic regression (5)

19-3 Building and applying logistic regression (6)

26-3 Generalizations of logistic regression (7)

2-4 Loglinear models (8)

5-4 15-17 hrs; 5B02 Building and applying loglinear models (9.1-9.3, 9.8)

23-4 Association (9.4-9.6)

3-5 15-17 hrs: 5A37 Matched pairs (10)

7-5 Repeated measurements (11/12)

14-5 Mixture models (13)

(2)

Logistic Regression

Today’s topics:

1. Introduction

2. Parameter interpretation 3. Inference

4. Categorical predictors

5. Multiple predictors

6. Software: SPSS

7. Software: ` EM

(3)

Introduction: Logistic Regression

The response variable (Y ) is a dichotomous variable. We may have one or more, continuous or categorical predictor variables.

For the moment lets consider one predictor variable X. Denote π(x) = P (Y = 1|X = x). The logistic regression model is

π(x) = exp(α + βx) 1 + exp(α + βx) or equivalently

logit [π(x)] = log π(x)

1 − π(x) = α + βx

The logit link is equated to the linear predictor.

(4)

Interpretation

How to interpret β?

1. The sign determines whether the possibility goes up or down with an increase in X.

2. The larger the absolute value of β the steeper the line. When β = 0 the line is flat and X and Y are independent.

3. The relationship between the predictor and the probability follows the

logistic curve.

(5)

Interpretation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

P(Y=1|x)

(6)

Interpretation

How to interpret β?

1. The odds increase multiplicatively by e β for a unit change in X.

2. e β is an odds ratio. The odds at X = x + 1 divided by the odds at X = x.

3. Use quartiles to get a better understanding.

4. Via linearization argument: The line tangent to the curve has slope βπ(x)[1 − π(x)]. This is approximately the increase in probability with an increase in predictor value of 1.

5. From this, it follows that near x where π(x) = .5, (i.e., x = −α/β) 1/β

approximates the distance between x-values that correspond to π(x) =

.25 or π(x) = .75 and π(x) = .5.

(7)

Inference

Significance tests usually test H 0 : β = 0. Possible tests (see class 1):

1. Wald statistic: z = β/SE. z 2 ∼ χ 2 with df=1.

2. Likelihood ratio statistic; Uses the difference of twice the maximized loglikelihood at ˆ β and β = 0. Also chi-square distributed with df=1.

The likelihood ratio statistic is preferred over the Wald statistic. It uses more information and has more power.

More information is usually provided by confidence intervals for β. These are

arrived through inverse reasoning.

(8)

Inference

Often we also like a confidence interval for the predicted probabilities (ˆ π(x)).

For a fixed value x = x 0 , logit[ˆ π(x 0 )] = ˆ α + ˆ βx 0 has a large-sample standard error (SE) given by the square root of

var(ˆ α + ˆ βx 0 ) = var(ˆ α) + x 2 0 var( ˆ β) + 2x 0 cov(ˆ α, ˆ β)

The variances and covariances of the regression weights can be obtained from formula (5.20).

A 95%-confidence interval for the logit is obtained by adding and subtracting 1.96SE from the estimated logit.

From this confidence interval we can obtain a confidence interval for the probabilities by

π(x 0 ) = exp(logit)

(9)

Inference: Goodness-of-fit stats

In practice there is no guarantee that the model fits the data well.

But if all more complex models do not increase the fit then this is some evidence that the chosen model is reasonable.

Detecting lack of fit by searching any way that the model fails. Therefore, X 2 and G 2 statistics are used. Data must be grouped: Categorize continuous variables.

An example is the Hosmer and Lemeshow statistic: Partition the data in g (approximately) equal groups based on predicted probabilities. Then form a contingency table of the groups against the two response categories. Compare fitted and observed frequencies.

Such tests indicate lack of fit but no insight about its nature.

(10)

Categorical predictors

Categorical variables are often named factors.

log

 π i 1 − π i



= α + β i

One must constrain one of the β i ’s, for example β 1 = 0 or P

i β i = 0.

This is like the ANOVA model

(11)

Categorical predictors

The same model can be made using dummy variables. A factor with I levels needs I −1 dummy variables. Like in multiple regression with dummy variables.

Example of dummy-variables for three-category Effect Dummy x 1 x 2 x 1 x 2

1 0 1 0

0 1 0 1

-1 -1 0 0

log

 π i 1 − π i



= α + β 1 x 1 + β 2 x 2 . . .

In effect coding the β i represents deviance from a ‘mean’. In dummy coding

the β i denote deviance from the baseline group for which we set β i = 0.

(12)

Categorical predictors

Effect coding corresponds with the constraint P

i β i = 0 in the ANOVA set-up whereas Dummy-coding corresponds with β I = 0.

Depending on the dummies chosen, the interpretation of β i changes. However, model fit does not change.

Whatever constraint is chosen ˆ α + ˆ β i does not change and so the probabilities remain the same.

The differences ˆ β a − ˆ β b for any pair (a, b) represent estimated log-odds ratios

(13)

Ordered Categorical predictors

If there are ordered categorical predictors for which we can find sensible scores (x 1 , x 2 , . . . , x I ) these scores might be used and we act as if the predictor is of interval level.

An advantage is that we have increased power if most of the relationship between predictor and logit is linear. We only use one degree of freedom.

Disadvantage: When the relationship between predictor and the logit is non-

linear we loose valuable information.

(14)

Multiple predictors

Like in ordinary regression, logistic regression extends to cases with multiple predictors. Let π( x ) = P (Y = 1|X 1 = x 1 , X 2 = x 2 , . . . , X p = x p ), then

π( x ) = exp(α + β 1 x 1 + β 2 x 2 + . . . + β p x p ) 1 exp(α + β 1 x 1 + β 2 x 2 + . . . + β p x p )

The parameters β i refers to the effect of x i on the log odds that Y = 1, controlling for the other x j (i.e. keeping the other x j fixed).

The predictor variables can, of course, be categorical (dummy) or continu- ous. When all predictors are categorical the data can be represented in a contingency table format. (The data has ‘grouped’ format).

With factors the ANOVA-model is written as log

 π i 1 − π i



= α + β i X + β k Z

(15)

Multiple predictors

Are predictors important ?

1. Use the Wald statistic ( ˆ β 2 /SE 2 ).

2. Use the likelihood ratio test. Compare two nested models, M 0 and M 1 with maximized log likelihood values L 0 and L 1 , respectively. Denote

G 2 (M 0 |M 1 ) = −2(L 0 − L 1 ), assuming that model M 1 holds.

G 2 (M 0 |M 1 ) = −2(L 0 −L 1 ) has a chi-squared statistic with df the difference

in number of (independent!) parameters of the two models.

(16)

SPSS

SPSS has under

Analyze − > Regression − > Binary Logistic..

a logistic regression program.

Contains many statistics, such as 1. many residuals

2. the Hosmer and Lemeshow statistic

3. influence diagnostics (to be discussed next week)

4. etc

(17)

` EM

Program for categorical data analysis (free!) Can be found at:

http://www.uvt.nl/faculteiten/fsw/organisatie/departementen/mto/software2.html This program is especially useful for the analysis of contingency tables but it

can do much more (See ‘examples’).

References

Related documents

Logistic regression 80 features -> 3 classes

Buses enable people to travel to work, school and college, for leisure, entertainment, shopping and to access important services like health appointments.. They enable families

In each wing root, displacement of the trim screwjack drives the all speed aileron servo control input linkage through the artificial feel unit, whose spring rod remains at

Nursing staff were asked to keep in mind their routine working with people with dementia and were asked to describe their role, the hospital context, their own and

Since heritability is a measure of phenotypic variation due to genetic differences, it has to be defined with respect to a population, and phenotypic variation of the trait within

This poem speaks to (2) in the first stanza: the breathing in of sweet aromas on what is declared to be a "festive day." The second stanza moves to the sweet, musical sound

• The objective was to determine if there is an increased probability of lung cancer associated with birdkeeping, even after accounting for other factors (e.g., smoking). •

Logistic analysis can be extended to multinomial dependents by modeling a series of binary comparisons: the lowest value of the dependent compared to a reference category (by