Interpreting a Model for a Continuous Predictor

Our ﬁrst example of predicting the presence of disease based on age was an instance of using a continuous predictor in logistic regression. Here we present another example, based on thechurndata set [4]. Suppose that we are interested in predicting churn based on a single continuous variable,day minutes. We ﬁrst examine an individual value plot of the day minute usage among churners and nonchurners, provided in Figure 4.2. The plot seems to indicate that churners have slightly higher mean day minute usage than nonchurners, meaning that heavier usage may be a predictor of churn. We verify this using the descriptive statistics given in Table 4.8. The mean and

400 300 200 100 0 Da y Minutes False True Churn

INTERPRETING A LOGISTIC REGRESSION MODEL 171

TABLE 4.8 Descriptive Statistics forDay MinutesbyChurn

Churn N Mean St. Dev. Min. _Q1 Median Q3 Max.

False 2850 175.18 50.18 0.00 142.75 177.20 210.30 315.60

True 483 206.91 69.00 0.00 153.10 217.60 266.00 350.80

ﬁve-number-summary for thechurn=truecustomers indicates higher day minute usage than for thechurn=false customers, supporting the observation from Fig- ure 4.2.

Is this difference signiﬁcant? A two-samplet-test is carried out, the null hypothesis being that there is no difference in true mean day minute usage between churners and nonchurners. The results are shown in Table 4.9. The resultingt-statistic is−9.68, with a p-value rounding to zero, representing strong signiﬁcance. That is, the null hypothesis that there is no difference in true mean day minute usage between churners and nonchurners is strongly rejected.

A word of caution is in order here about carrying out inference in data mining problems, or indeed in any problem where the sample size is very large. Most statistical tests become very sensitive at very large sample sizes, rejecting the null hypothesis for tiny effects. The analyst needs to understand that just because the effect is found to be statistically significant because of the huge sample size, it doesn’t necessarily follow that the effect is of practical significance. The analyst should keep in mind the constraints and desiderata of the business or research problem, seek confluence of results from a variety of models, and always retain a clear eye for the interpretability of the model and the applicability of the model to the original problem.

Note that thet-test does not give us an idea of how an increase inday minutes affects the odds that a customer will churn. Neither does thet-test provide a method for ﬁnding the probability that a particular customer will churn, based on the customer’s day minutes usage. To learn this, we must turn to logistic regression, which we now carry out, with the results given in Table 4.10.

First, we verify the relationship between the odds ratio forday minutesand its coefﬁcient.OR∧ =eb1=_e0.0112717=₁.₀₁₁₃₃₅∼=₁.₀₁,_{as shown in Table 4.10. We} discuss interpreting this value a bit later. In this example we haveb0= −3.92929 and b1=0.0112717.Thus, the probability of churningπ(x)=eβ0+β1x/(1+eβ0+β1x) for

TABLE 4.9 Results of a Two-Samplet-Test forDay MinutesbyChurn

Two-sample T for Day Mins

Churn N Mean StDev SE Mean

False 2850 175.2 50.2 0.94

True 483 206.9 69.0 3.1

Difference = mu (False) - mu (True) Estimate for difference: -31.7383

95% CI for difference: (-38.1752, -25.3015)

T-Test of difference = 0 (vs not =): T-Value = -9.68

SPH SPH

JWDD006-04 JWDD006-Larose November 23, 2005 14:51 Char Count= 0

172 CHAPTER 4 LOGISTIC REGRESSION

TABLE 4.10 Results of Logistic Regression ofChurnonDay Minutes

Logistic Regression Table

Odds 95% CI

Predictor Coef SE Coef Z P Ratio Lower Upper

Constant -3.92929 0.202822 -19.37 0.000

Day Mins 0.0112717 0.0009750 11.56 0.000 1.01 1.01 1.01

Log-Likelihood = -1307.129

Test that all slopes are zero: G = 144.035, DF = 1, P-Value = 0.000

a customer with a given number of day minutes is estimated as

ˆ π(x)= e ˆ g(x) 1+eg(x)ˆ = e−3.92929+0.0112717(day mins) 1+e−3.92929+0.0112717(day mins) with the estimated logit

g(x)= −3.92929+0.0112717(day mins)

For a customer with 100 day minutes, we can estimate his or her probability of churning: ˆ g(x)= −3.92929+0.0112717(100)= −2.80212 and ˆ π(x)= e ˆ g(x) 1+eg(x)ˆ = e−2.80212 1+e−2.80212 =0.0572

Thus, the estimated probability that a customer with 100 day minutes will churn is less than 6%. This is less than the overall proportion of churners in the data set, 14.5%, indicating that low day minutes somehow protects against churn. However, for a customer with 300 day minutes, we have

ˆ g(x)= −3.92929+0.0112717(300)= −0.54778 and ˆ π(x)= e ˆ g(x) 1+eg(x)ˆ = e−0.54778 1+e−0.54778 =0.3664

The estimated probability that a customer with 300 day minutes will churn is over 36%, which is more than twice the overall proportion of churners in the data set, indicating that heavy-use customers have a higher propensity to churn.

The deviance differenceGfor this example is given by

G=deviance (model without predictor)−deviance (model with predictor) = −2 lnlikelihood without predictor

likelihood with predictor =2 _n i=1 [yiln [ ˆπi]+(1−yi) ln [1−πˆi]]−[n1ln(n1)+n0ln(n0)−n ln(n)] =2{−1307.129−[483 ln(483)+2850 ln(2850)−3333 ln(3333)]} =144.035

INTERPRETING A LOGISTIC REGRESSION MODEL 173

as indicated in Table 4.10. The p-value for the chi-square test for G, under the assumption that the null hypothesis is true (β1=0), is given byP(χ12)>Gobserved= P(χ₁2)>144.035∼=0.000, as shown in Table 4.10. Thus, the logistic regression concludes that there is strong evidence thatday minutesis useful in predictingchurn. Applying the Wald test for the signiﬁcance of theday minutesparameter, we haveb1 =0.0112717 and SE(b1)=0.0009750,giving us

ZWald=

0.0112717

0.0009750 =11.56

as shown in Table 4.10. The associatedp-value ofP(|z|>11.56)∼=0.000,usingα= 0.05, indicates strong evidence for the usefulness of the day minutesvariable for predicting churn.

Examining Table 4.10, note that the coefﬁcient forday minutesis equal to the natural log of its odds ratio:

bCSC-Med=ln(1.01)≈ln(1.011335)=0.0112717 bday mins=ln(1.01)≈ln(1.011335)=0.0112717 Also, this coefﬁcient may be derived, similar to equation (4.4), as follows:

ln [OR (day minutes)]=gˆ(x+1)−gˆ(x)=[b0+b1(x+1)] −[b0+b1(x)]

=b1=0.0112717 (4.5)

This derivation provides us with the interpretation of the value for b1. That is, b1represents the estimated change in the log odds ratio for a unit increase in the predictor. In this example,b1=0.0112717, which means that for every additional day minute that the customer uses, the log odds ratio for churning increases by 0.0112717.

The value for the odds ratio we found above, OR∧ =eb1 ₌_e0.0112717₌ 1.011335∼=1.01,may be interpreted as the odds of a customer withx+1 minutes churning compared to a customer withxminutes churning. For example, a customer with 201 minutes is about 1.01 times as likely to churn as a customer with 200 minutes. This unit-increase interpretation may be of limited usefulness, since the analyst may prefer to interpret the results using a different scale, such as 10 minutes or 60 minutes, or even (conceivably) 1 second. We therefore generalize the interpretation of the logistic regression coefﬁcient as follows:

INTERPRETING THE LOGISTIC REGRESSION COEFFICIENT FOR A CONTINUOUS PREDICTOR

For a constantc, the quantitycb1represents the estimated change in the log odds ratio, for an increase ofcunits in the predictor.

This result can be seen to follow from the substitution of ˆg(x+c)−gˆ(x) for ˆ

g(x+1)−gˆ(x) in equation (4.5): ˆ

g(x+c)−gˆ(x)=[b0+b1(x+c)]−[b0+b1(x)] =cb1

SPH SPH

JWDD006-04 JWDD006-Larose November 23, 2005 14:51 Char Count= 0

174 CHAPTER 4 LOGISTIC REGRESSION

For example, letc=60, so that we are interested in the change in the log odds ratio for an increase in 60 day minutes in cell phone usage. This increase would be estimated ascb1 =60(0.0112717)=0.676302. Consider a customer A, who had 60 more day minutes than customer B. Then we would estimate the odds ratio for customer A to churn compared to customer B to bee0.676302=1.97.That is, an increase of 60 day minutes nearly doubles the odds that a customer will churn.

Similar to the categorical predictor case, we may calculate 100(1−α)% con- ﬁdence intervals for the odds ratios as follows:

expbi±z·

∧

SE(bi)

For example, a 95% conﬁdence interval for the odds ratio forday minutesis given by expb1±z· ∧ SE(b1) =exp [0.0112717±(1.96)(0.0009750)] =(e0.0093607,e0.0131827) =(1.0094,1.0133) ∼ =(1.01,1.01)

as reported in Table 4.10. We are 95% confident that the odds ratio for churning for customers with 1 additional day minute lies between 1.0094 and 1.0133. Since the interval does not includee0₌_{1, the relationship is significant with 95% confidence.} Confidence intervals may also be found for the odds ratio for theith predictor when there is a change incunits in the predictor, as follows:

expcbi±zc·

∧

SE(bi)

For example, earlier we estimated the increase in the odds ratio when the day minutes increased byc=60 minutes to be 1.97. The 99% conﬁdence interval associated with this estimate is given by

expcbi±zc· ∧ SE(bi) =exp60(0.0112717)±2.576 (60)(0.0009750) =exp [0.6763±0.1507] =(1.69,2.29)

So we are 99% conﬁdent that an increase of 60 day minutes will increase the odds ratio of churning by a factor between 1.69 and 2.29.

In document Data Mining Methods And Models Larose DT (2006) pdf (Page 188-192)