BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
Subject
BUSINESS ECONOMICS
Paper No and Title
8, Fundamentals of Econometrics
Module No and Title
28, Introduction to Binary Dependent Variable and the
Linear Probability Model: LOGIT/ PROBIT Models
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
TABLE OF CONTENTS
1. Learning Outcomes
2. Introduction
3. Logit and Probit Model
4. Odd Ratio
5. Marginal Effects
6. Goodness of Fit
7. Summary
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
1. Learning Outcomes
After studying the module you will be able to understand: What is Logit and Probit Model and when it’s useful? What’s the difference between Logit and Probit Models Estimation of Logit and Probit Models
Marginal Effects in Logit and Probit Models Goodness of Fit
2. Introduction
Many times our dependent variable in regression model will not be continuous. One of the types of such qualitative models is a binary response model in which the dependent variable Y is a binary random variable that takes on only the values 0 and 1. For example our dependent variable can be whether a person gets a job from the college (1) or don’t get the job from the college (0). Another example could be a situation where our dependent variable can be abnormal blood pressure (1) and normal blood pressure (0). The econometric problem is to estimate the conditional probability that gives y=1 as a function of certain known explanatory variables i.e. 𝑝(𝑌 = 1|𝑋 = 𝑥) = 𝑓 (𝛽0+ 𝛽1𝑥).
The functional form 𝑓 should be such that 𝑦 lies between 0 and 1 for all possible values of 𝛽0+ 𝛽1𝑥i.e. between −∞ to ∞.
Such binary response models can be modeled using ordinary least square methods and the same is known as linear probability models.
𝜋𝑖 = 𝜷𝒊𝒙 I
The OLS estimation in case of binary response dependent variable is very straightforward and the interpretation of the coefficient is simple. It's computationally simpler, "marginal effects", can be explained easily and the ambiguity over 𝑓 is not there. At the same time there are some issues with OLS estimates. These estimates are not constrained to the unit interval as the predicted value can be greater than 1 and less than 0. OLS estimation imposes heteroskedasticity in the case of a binary response variable. According to Horrace and Oaxaca (2006) the OLS estimates in case of binary dependent variable are biased and inconsistent estimates (See also Amemiya (1977)). The heteroskedasticity issue can be overcome by using heteroskedasticity-consistent robust standard error estimates. Biasedness is also not a big issue as MLE estimates Logit and Probit (to be explained further) models used for modeling such binary response variable are also biased in finite samples but they are consistent. Given the sample sizes that we usually work with when modeling binary data, it's consistency and asymptotic efficiency that are of primary importance. Horrace and Oaxaca (2006) show that biasedness and inconsistency increases as the predicted probability fall outside the unit interval i.e. between 0 and 1. Below Graph is obtained using a simple OLS framework. The data set is of patients suffering from Diarrhea. Data contains a
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
binary variable hospitalization (Admit=1 not admit=0) and the age of the patients. It's a hypothetical dataset and is given at the end in the appendix.
Graph: 1
The probability of hospitalization of patient suffering from diarrhea decreases with age. The problem with OLS estimation is the predicted probability is higher than at 1 low ages and it seems that at higher age the probability will become negative. This points us towards another issue with OLS estimation that is the probability increases or decreases at constant rate.
Horrace and Oaxaca (2006) even suggest that some kind of trimming rule, which excludes the values, which is making the predicted probability to go outside the unit interval can make OLS estimates unbiased and consistent. Therefore if the predicted probabilities are in the unit interval OLS estimates may be consistent. Logit and probit models help in overcoming many of the shortcomings in the OLS model but it imposes strict distributional assumption, violation of which may lead to biased estimates.
0
.5
1
0 10 20 30 40
Age
Admit Fitted values
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
3. Logit and Probit Models
Logit and probit is based on a latent variable model 𝑃(𝑌𝑖 = 1|𝑥) = 𝑃(𝑌𝑖 > 0|𝑥)
𝑃(𝑌𝑖 = 1|𝑥) = 𝑃(𝑥𝑖𝛽 + 𝜖𝑖 > 0|𝑥) 𝑃(𝑌𝑖 = 1|𝑥) = 𝑃(𝜖𝑖 > −𝑥𝑖𝛽|𝑥) 𝑃(𝑌𝑖 = 1|𝑥) = 1 − 𝐹(−𝑥𝑖𝛽) 𝑃(𝑌𝑖 = 1|𝑥) = 𝐹(𝑥𝑖𝛽)
The Logit and Probit Models differ in their choice of the 𝐹 (𝑥𝑖𝛽). In case of the Logit Model the 𝐹 (𝑥) is a cumulative logistic distribution given by
𝐹(𝑥) = 1 1 + 𝑒−𝑥 =
𝑒𝑥 1 + 𝑒𝑥
And the corresponding probability density function is given by 𝑓(𝑥) = 𝑒
−𝑥 (1 + 𝑒−𝑥)2
In case of the probit models the 𝐹 (𝑥) is a cumulative normal distribution given by 𝐹(𝑥) 𝐹(𝑥) = (𝑥) =1
2+ {1 + erf ( 𝑥 √2)}
And the corresponding probability density function is given by 𝑓(𝑥) = 1
√2𝜋𝑒 −𝑥2
2
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
The Graph 2 above gives the cumulative logistic and normal distribution. The logistic distribution gives a higher probability in the beginning and that's is because the logistic probability density (See Graph: 3) has fat tails in comparison to the normal distribution.
Graph 3
Estimation: Estimation of both logit and probit is done through maximization of log-likelihood of joint density as give below. 𝐹(𝑥)is a logistic distribution in case of logit model and is a normal distribution in the case of probit models as explained above. The optimal value of 𝛽 vector is the one, which maximizes the log likelihood. Once the 𝛽 is obtained the probability can be easily obtained through the respective density or distribution, as the probability is the area under the probability density curve corresponding to the 𝛽𝑥𝑖 or value obtained from the cumulative distribution corresponding to 𝛽𝑥𝑖. Joint Density 𝑓(𝑦|𝑥, 𝛽) = Π𝑖 𝐹((𝑥𝑖𝛽)𝑦𝑖[1 − 𝐹((𝑥𝑖𝛽)]1−𝑦𝑖 Log Likelihood 𝑙𝑛 𝐿 = ∑ 𝑦𝑖 𝑖 𝑙𝑛 𝐹(𝑥𝑖𝛽) + (1 − 𝑦𝑖 )ln (1 − 𝐹(𝑥𝑖𝛽)) Where in case of logit model
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
𝐹(𝑥𝑖𝛽) = 1 1 + 𝑒−𝑥𝑖𝛽 = 𝑒𝑥𝑖𝛽 1 + 𝑒𝑥𝑖𝛽 And incase of Probit Model𝐹(𝑥𝑖𝛽) = (𝑥) = 1
2+ {1 + erf ( 𝑥𝑖𝛽
√2)}
The log likelihood is maximized and the optimal values of the coefficients are estimated. (For details see Cameron & Trivedi 2005)
Graph: 4
Suppose that we want to know probability corresponding to 𝛽𝑥𝑖 = 1 this is the non-shaded area under normal distribution in Graph: 4 under the probit estimation. If we would have done logit estimation then the probability would have been slightly lower and the same can be seen from the probability obtained from the cumulative distribution as given in Graph: 5
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
Graph: 5
The logit and probit couldn’t be a better fit in many cases but this certainly improves upon the two crucial shortcoming of the OLS estimation. In case of logit and probit the predicted probability will not go outside the unit interval (Graph: 6 Shows the predicted probability from the three models) and the rate of change of probability (marginal effects) is not constant (More on this further).
Graph: 6
0 .5 1 P r 0 10 20 30 40 b Admit Pr(Logit) Pr(Probit) Pr(LPM)Predicted Probablity By Logit Probit and LPM
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
4. Odd Ratio
The coefficient 𝛽 can be understood in terms of odd ratio in the case of the logit model. 𝜋𝑖 = 𝐹(𝑥𝑖𝛽) =
𝜀𝜷𝒊𝒙 1 + 𝜀𝜷𝒊𝒙 Define odd ratio
𝑂𝑅 = 𝜋𝑖 1 − 𝜋𝑖 And this gives us odd ratio as
𝜋𝑖 1 − 𝜋𝑖 = 𝜀𝜷𝒊𝒙 𝑙𝑛 𝜋𝑖 1 − 𝜋𝑖 = 𝑙𝑛𝑂𝑅 = 𝜷𝒊𝒙
From above we can deduce that change in log odd ratio is 𝛽𝑗 when 𝑥𝑗 changes by one unit in case of logit model. In case of probit model when 𝑥𝑗 changes by one unit 𝑧 values changes by 𝛽𝑗 unit and explaining this is a bit difficult in comparison to the logit model.
5. Marginal Effects
Marginal effects are also called instantaneous rates of change; you compute them for a variable while all other variables are held constant. Marginal effects are basically the change in probability due to change in 𝑥𝑗,
𝑑𝜋𝑖
𝑑𝑥𝑗 Calculating marginal effects is very straight forward in case of OLS estimation as 𝛽𝑗 itself represent the marginal effects. Remember in case of OLS estimation the rate of change of probability is constant. In case of logit and probit model calculating marginal effects are not that straightforward as the change in probability due to change in 𝑥𝑗 is not constant. The approximate marginal effects at the mean value of the covariates are given by below expression
𝑑𝜋𝑖 𝑑𝑥𝑗
= 𝛽𝑗. 𝜋𝑖 (1 − 𝜋𝑖)
For binary independent variables, marginal effects measure discrete change, i.e. how do predicted probabilities change as the binary independent variable changes from 0 to 1? For categorical variables with more than two possible values, e.g. religion, the marginal effects show you the difference in the predicted probabilities for cases in one category relative to the reference category. So, for example, if religion was coded 1 = Catholic, 2 = Protestant, 3 = Jewish, 4 =
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
other, the marginal effect for Protestant would show you how much more (or less) likely Protestants were to succeed than were Catholics, the marginal effect for Jewish would show you how much more (or less) likely Jews were to succeed than were Catholics, etc. Keep in mind that these are the marginal effects when all other variables equal their means (hence the term MEMs); the marginal effects will differ at other values of the 𝑥. Marginal Effects for our example is given in Table: 1
Table: 1
Table 1 gives the marginal effects at means. The marginal effects at means are lowest for linear model and it’s the same for all values of Age as shown in Graph: 7. Marginal effects of logit and probit models are almost same and first decrease with age and then start increasing. This makes sense in case of our example as in the beginning the marginal effects of age decreases in beginning and becomes negative but after a certain age the marginal effects of age start increasing and becomes positive.
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
Graph: 7
Examples:
Example 1: Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.
Example 2: A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.
This data set has a binary response (outcome, dependent) variable called admit. The data set can be obtained from http://www.ats.ucla.edu/stat/stata/dae/binary. There are three predictor variables: gre, gpa and rank. We will treat the variables gre and gpa as continuous. The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest. We are interested in understanding how gre, gpa and rank affects the probability to admit?
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
We estimate all the three models and the output is given in Table: 2.
Table: 2. Regression Result
In the table we see the coefficients and associated p values. Both gre and gpa are statistically significant, as are the three indicator variables for rank which are significant in all the three models. Positive coefficients imply that increase in the predictor increases the probability. The logistic regression coefficients give the change in the log odds of the outcome for a one-unit increase in the predictor variable, while probit regression coefficients gives change in 𝑧 value. For every one-unit change in gre, the log odds of admission (versus non-admission) increases by 0.002.For a one unit increase in gpa, the log odds of being admitted to graduate school increases by 0.804.The indicator variables for rank have a slightly different interpretation. For example, having attended an undergraduate institution with rank of 2, versus an institution with a rank of
1, decreases the log odds of admission by 0.675. The regression coefficients in the other models can be explained similarly.
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
Table: 3
The above table gives the marginal effects at the mean values of the predictor values. These marginal effects have been calculated at the mean values of the given and other predictors. The marginal effects must be used to calculate the change in the probability for small values of change in the predictor variables. For large change in the predictor variable it may not give the accurate results.
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
Graph: 8
Graph: 9
The above graph 8, 9 gives the predicted probability for admission at GPA and GRE as GPA and GRE score varies for rank 4 while other variables are kept at their mean value. This means in
0 .1 .2 .3 P ro b a b ili ty 2 2.5 3 3.5 4 GPA Linear Logit Probit
Predicted Probability vs GPA for Rank 4
0 .1 .2 .3 P ro b a b ili ty 200 400 600 800 GRE Score Linear Logit Probit
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
Graph: 9 it gives the predicted probability of admission at different values of GRE score for rank 4 and mean value of other predictors. Once again we can see the linearity in the OLS predicted probabilities and non-linearity in the predicted values of Logit and Probit Models. The important thing to see from here is that the predicted probabilities from both the Logit and Probit are almost same and the difference arises for very low and high values and that is because of the differences in the tail of the logistic and normal distribution. At mean value of GPA and GRE even the OLS predicted probabilities are same as of Logit and Probit Models. This makes choice of correct model a tough choice as to be explained later.
Graph: 10
Graph: 11
.0 0 0 2 .0 0 0 2 5 .0 0 0 3 .0 0 0 3 5 .0 0 0 4 .0 0 0 4 5 P ro b a b ili ty 200 400 600 800 GRE Linear Logit ProbitMarginal Effects of GRE vs GRE for Rank 4
.0 6 .0 8 .1 .1 2 .1 4 .1 6 P ro ba bi lit y 2 2.5 3 3.5 4 GPA Linear Logit Probit
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
The above graphs 10, 11 give the marginal effect of GRE and GPA as GRE and GPA score varies while other variables are kept at their mean value for rank 4. Once again we can see that OLS marginal effects are constant and non-linearity in the marginal effects of the Logit and Probit Models. The important thing to see from here is that the marginal effect from the Logit model is always lesser than the marginal effects from the Probit. At very high values of the GRE and GPA the marginal effects from the Logit and Probit almost becomes equals to the marginal effects of OLS.
6. Goodness of Fit
The traditional measure of goodness of fit 𝑅2 used in the case of OLS regression is not applicable in the case of logit and probit models. The OLS estimation is based on minimizing the residual sum of squares whereas logit and probit models are estimated through maximum likelihood, which is an iterative process. Therefore in the case of logit and probit we have measure of goodness of fit known as pseudo 𝑅2. These measures also lie between 0 and 1 and a higher value implies better fit, but they can’t be interpreted as normal 𝑅2. We explain below commonly used pseudo 𝑅2. Efron’s Pseudo 𝑅2 𝑅2= 1 −∑ (𝑦𝑖− 𝜋𝑖) 2 𝑁 𝑖=1 ∑𝑁𝑖=1(𝑦𝑖− 𝑦)2 Where 𝜋 is the predicted probability and 𝑦 is the mean.
Mcfadden’s Pseudo 𝑅2
𝑅2= 1 − 𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑𝑓𝑢𝑙𝑙 𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
Most of the statistical package reports log likelihood after estimation. We need to calculate two log likelihood, one without predictor and other with predictors. Once we have these log likelihood we can use the above formula to calculate Mcfadden’s Pseudo 𝑅2. The log likelihood of the above model in example with intercept only is -249.98826 and with predictors the log likelihood is -229.25875.
Mcfadden’s Pseudo 𝑅2
𝑅2= 1 −−229.25875
−249.98826 = 0.082921934 Mcfadden’s Adjusted Pseudo 𝑅2
= 1 −𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑𝑓𝑢𝑙𝑙− 𝐾 𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
Mcfadden’s Adjusted Pseudo 𝑅2 is similar to adjusted 𝑅2 of OLS and its try to adjust for number of predictors. In our example the number of predictors is 5 and so
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
Mcfadden’s Adjusted Pseudo 𝑅2
𝑅2= 1 −−229.25875 − 5
−249.98826 = 0.062920995 Cox & Snell Pseudo 𝑅2
𝑅2= 1 − (𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑𝑓𝑢𝑙𝑙
)2/𝑁 Ratio (𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝑙𝑜𝑔 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑𝑓𝑢𝑙𝑙 )
1 depicts the improvement of full model with predictor variables in comparison to the model with intercept only, lower the ratio greater is the improvement. The value of Cox & Snell Pseudo 𝑅2 can’t be one and that’s what differentiates it with other measures.
Count Pseudo 𝑅2
𝑅2= 𝑐𝑜𝑢𝑛𝑡 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑢𝑛𝑡
After the estimation of the model we can calculate predicted probabilities. Most of the statistical package offers the feature of estimating predicted values after the regression model. Once the predicted probabilities are there, we will use these to construct a binary variable of 0 and 1. We choose a cut off predicted probability of 0.5 and those having less that this are given 0 and greater than equal to this are given 1. Once this binary is ready we match this binary with actual and matched 0 and matched 1 are counted. We divide this count with total count to get Count Pseudo 𝑅2. Most of the statistical package offers count table (Below table has been obtained from Stata)
BUSINESS
ECONOMICS
PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
MODULE NO. :28, AN INTRODUCTION TO BINARY DEPENDENT
VARIABLE AND THE LINEAR PROBABILITY MODEL: LOGIT/ PROBIT
MODELS
In this table + sign refers to the no of matches between original and predicted binary for 1 and ‘-' sign refers to the no of matches between original and predicted binary for 0. Sensitivity is proportion of the 1's that are correctly identified; 30/127 = 0.236220472. Specificity is the proportion of 0's correctly identified; 254/273 = .93197279. The proportion correctly classified, also known as the Count R2, is (254+30)/200 = 0.710.
Adjusted Count Pseudo 𝑅2
𝑅2= 𝑐𝑜𝑢𝑛𝑡 − 𝑛 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑢𝑛𝑡 − 𝑛
The count R2 can be misleading values under certain circumstances. In a binary model one can easily categorize at least 50% of the cases, without the use of predictors variables, by choosing the most common outcome. The count R2 needs to be adjusted by the largest row marginal total. In our example, the adjusted count R2 = ((30+254) - 273)/(400 -273)= .087. We can say the adjusted count is percentage of match above what can be done by guessing.
7. Summary
LPM avoids the risk of misspecification of the "link function", that is whether Logit or Probit to use. The estimated marginal effects from the LPM, Logit and Probit models are usually very similar, especially if you have a large sample size. The estimated marginal effects from the LPM, Logit and Probit models are usually very similar, especially if you have a large sample size. There are complications with Logit or Probit if you have endogenous dummy regressors. In the linear regression model, certain type of misspecification has only mild implications for our inferences. However, these results change if the model is non-linear in the parameters - a fact that is well. More specifically, these results change (for the worse) in the context of such non-linear models as Logit, Probit, Tobit, etc.
The choice between the Logit and Probit is a tough one and it mainly depends upon like any other econometric modeling on knowledge of the response distribution, theoretical considerations, and Empirical fit to the data. From the point of view of your substantive theory, if you are thinking of your covariates as directly connected to the probability of success, then you would typically choose logistic regression because it is the canonical link. However, consider the following example: You are asked to model high Blood Pressure as a function of some covariates. Basically we are interested in understanding how various characteristics affects blood pressure. Blood pressure is normally distributed in the population (It seems Prima Facie and even medical evidence suggest something similar). But during the experiment we recorded the blood pressure as a binary variable that is 1 for high blood pressure and 0 for normal blood pressure. What we are observing as binary variable is actually observation from a hidden Gaussian distribution and in this case, probit would be preferable a-priori for theoretical reasons. Lastly, note that the empirical fit of the model to the data is unlikely to be of assistance in selecting either Logit of Probit, as the shapes of the link functions in question don’t differ substantially. But still we can test whether logit or probit is a better fit by various post estimation model selection tests.