ON PROBLEMS OF STATISTICAL
MODELING
B. Venkateswarlu
Associate Professor, Priyadarshini Institute of Technology,
Nellore, India.
Dr.P.Balasiddamuni,
Professor of Statistics , S.V.University, Tirupati , India.
B.Ramana Murthy,
Research Scholar in Statistics, S.V.University, Tirupati , India.
Dr.G.S.G.N.Anjaneyulu, VIT University,
Vellore, India.
In the present study, the various problems relating to the statistical modeling have been described by suggesting different criteria for Model specification, Selection of Regressors and Selecting between two non-nested linear statistical models.
Key words: Statistical model, Regression, Least squares, specification test, Information matrix 1. INTRODUCTION:
Model means a set of structural/functional relationships between two or more variables. Generally, these functional relationships can be expressed in terms of mathematical equations. A set of mathematical equations concerns with certain variables is called a mathematical model. By introducing an error random variable, the mathematical model becomes a statistical model.
Traditional statistical model building proceeds along the following lines: 1. Specification or formulation of the statistical model of the theory 2. Estimation of the parameters of the statistical model
3. Testing of the hypothesis about the parameters
4. Forecasting or prediction variables of the estimated model 5. Using the statistical model for control or policy decisions
Statistical model may be either in the form of a set of linear equations (Linear Regression Model) or in the form of a set of non-linear equations (Non-linear Regression Model).
Models can be broadly classified into four types:
i). Deterministic models or mathematical models: These are described by exact functional relationships but not involving stochastic variables.
ii). Stochastic models or statistical models: In these models, the functional relationships may contain one or more random variables.
iii).Static Models: The variables involved in these models are independent of time. iv).Dynamic Models: The variables involved in these models are depending on time.
Some special types of statistical models which have a wide number of applications, found in Applied Regression Analysis, are
1. Classical or Gauss-Markoff or General Linear Regression Models. 2. Non-linear Models which are linear in parameters
3. Non-linear models which are nonlinear in parameters: These models are again two types namely (a) Non-linear models that are intrinsically linear and
(b) Non-linear models that are intrinsically nonlinear. 4. Qualitative and limited dependent variable models 5. Generalized Linear Regression Models
6. Simultaneous linear Equations models
7. Models for pooling of time series and cross section data (Models for panel data): These models include
(a) Error components models
(b) Random Coefficients Regression (RCR) Models
(c) Varying parameter models: Switching Regression and Piecewise Regression Models 8. Distributed Lag Models
9. Forecasting Models
10. Sets of linear Regression Models or Seemingly unrelated Regression Equations (SURE) Models. 11. Nested and Non-Nested Models etc.,
In the present study, an attempt has been made to describe some important problems related to statistical model building such as, Model Selection Criteira, Model Specification Tests and Selection of Regressors in Statistical Models.
II. SPECIFICATION TESTS FOR STATISTICAL MODELS
The first and foremost step in any statistical research or methodology is the specification or formulation of the model. In classical statistics, model specification deals with the two topics of estimation and testing hypothesis. In Applied regression analysis, a research worker is certainly faced with the problem of specification of the model. By specification, we mean (i) the fittng of explanatory variables in each equation (ii) the functional form relating these variables to the dependent variable (iii) the stochastic properties of the disturbance term in the model.
Generally, in applied regression analysis, the linear regression relationship may not be correctly specified. Over specification yields unbiased estimates of regression coefficients, but larger variance; under specification yields biased estimates of the regression coefficients and understates the variance of these estimates.
One of the most important components of statistical model building is tests for specification errors. A model can be misspecified in a number of ways. Two major sources are incorrect functional form and invalid assumptions on the distribution of error term in the model. Regarding the functional form, linearity is often assumed for simplicity when a nonlinear function would be more appropriate and this may be accompanied by exclusion (or inclusion) of some relevant (or irrelevant) variables. In respect of the error term, classical regression analysis is based on the assumption of disturbance normality, homoscedasticity and serial independence. Violation of these assumptions affects both estimation and inference results.
The main contributors relating to the problem of model specification tests are Ramsey (1969, 1974), Thursby (1979, 1982), Ullah (1985), Davidson and Mac Knnon (1981), Bera and McAleer (1989), Banerjee and Magnus (2000), and others.
Some important specification tests existing in the literature are given by: 1. Ramsey’s Regression Specification Error Test (RESET)
2. Utts Rainbow Test
3. Plosser, Schwart and White (PSW) Differencing test 4. White’s Information Matrix test
5. Hausman’s Specification Test
6. J and JA Tests for Non-Nested Hypothesis 7. Rank Specification Error Test (RASET)
8. Kolmogorov Specification Error Test (KOMSET) 9. Bartlett’s M-Specification Error Test (BAMSET)
10. Test for functional Misspecification in Regression Analysis 11. Grouping Test for Misspecification
12. Test for Specification Error
13. White test for Functional form
In the present study, a test for model specification of linear model is proposed by using restricted least squares estimators.
RESTRICTED LEAST SQURES SPECIFICATION TEST Consider a general statistical linear model in matrix notation as
1 1
1
Xn k k n
n
Y
[2.1]
Or
1 1 2 2 2 2 1 1 1 1 1
1
Xnk k X nk k n n
Y
[2.2]
Such that
In
2 0, N
~
Assume that Rank of
2 1 '1X K
X
The OLS estimator of
is given by
ˆ
=
2 1ˆ
ˆ
= 1 2 1 2 1 1 2 2 1 1 1 1 1X
X
X
X
X
X
X
X
Y
X
Y
X
1 2 1 1 [2.3]The sampling distribution of
ˆ
is given by N
,
2
X
'X
1
Suppose that the X1 set of regressors appears in the model are extraneous and are omitted from the linear model. In other words, suppose that X2 has zero coefficient vector. This may be written in the form of a general linear hypothesis,
H1: R
= rWhere R=
0
I
k2 k2k
=1 k 2 1
r = O
k
2R is a
k
2
k
known matrix of rankk
2Write the general restricted least squares estimator of
as*
=
X X R
R
X X R
R r
ˆ 1 ' 1 ' 1 ' 1 ' 1
ˆ [2.4]
Under H1:
2 = 0, the restricted least squares estimator of
is given by*
=
2 1 1 1 1 1 1 2 * 1 k O Y X X X k O [2.5]Under H0:
2 = 0, it can be shown that* 1
is an unbiased estimate of
1 and var
* 1
=
' 1 12
X X
Also, the difference between the mean squared errors (MSE) of
ˆ
1 and* 1
can be obtained as
1 1 1 1 1 1 1 2 ' 2 2 ˆ var 2 1 1 1 1 1 1 * 1 1ˆ 1
MSE MSE x x x x x x x x
The matrix
1 will be positive semi definite if
var
ˆ
2
2
2'
is positive semi definite. Under the specification test, one may stateH0:
2 = 0 ~ H1:
2 0The proposed specification test is given by
*
1 1 1 * 1 1
' * 1 1
*
ˆ
var
ˆ
var
ˆ
Q
Under H0:
2 = 0, the test statistic *Q
~ k12
x
III. SELECTION OF REGRESSORS IN STATISTICAL MODLES
Selection of regressors is an important problem in statistics as well as in any other field of science that uses regression analysis. There is considerable literature on subject of the selection of the best subset of regressors in general statistical linear models.
The main contributions relating to the criteria of selection of regressors in Statistical models have been made by Gorman and Toman (1966), Lindley (1968), Mallows (1973), Hocking (1976), Thompson (1978), Miler (1984, 1990), Amemiya (1980), George (2000) and others.
Some important criteria for selection of regressors in statistical models existing in the literature are
1. The R2 and R2 criteria
2. The Cp criterion
3. Amemiya’s unconditional MSE prediction criterion
4. The Spcriterion
5. Akaike’s Information Criterion 6. Sawa’s BIC Criterion
7. The Reformulated AIC
8. Jeffreys – Bayes posterior odds Ratio
9. Stein-Rule Formulation for selection of Regressors 10. The BIVAR Criterion
11. Forward, Backward and stepwise selection for variables in statistical models 12. Stopping rule for variable selection
13. Average Estimated Variance (AEV) criterion 14. Influence Measures for Variable Selection 15. PRESS Criterion. etc.,
A MODIFIED PRESS CRITERION FOR SELECTION OF REGRESSORS
In the present study, a modified PRESS Criterion has been suggested for subset selection of regressors in linear statistical models.
Consider two linear regression models as
i.
1
1
1
1
1
1
1
X
n
k
k
n
n
Y
[3.1]
ii. 2 2 1 1
2 2 1 1 1 1
1
X n k k X n k k n
n
Y
[3.2]
Such that
In
2 0, N
~
For the subset selection or regressors, we state the null hypothesis as H0:
2 = 0The PRESS statistic may be computed for both the restricted and unrestricted linear statistical models (3.1) and (3.2) respectively as
PRESS
R and
PRESS
URUnder the proposed PRESS Criterion, the model (3.1) can be selected under subset selection of regressors if
PRESS
UR >
PRESS
RIn other words, to test
H0:
2 = 0, the F-statistic based on PRESS residuals is given by
~ 2,
1 2
2 1
2
k k n K F k
k n UR PRESS
K UR PRESS R
PRESS
F
[3.3]
MODEL SELECTION CRITERIA
Selection of the best regression model form is one of the important areas for Research in Statistics. In the process of selecting statistical models, Statisticians have developed a variety of diagnostic tests. These tests have been classified into two categories:
(a) Diagnostic tests of Nested Regression Models and (b) Diagnostic tests of Non-Nested Regression Models.
If a model-I can be described as a special case of another model-II then model-I is said to be Nested meld within model-II. Two models are said to be Non-Nested Models, if one can be derived as a special case of another.
Some important tests for choosing nested and Non-Nested Models are given by: i. J-Test
ii. JA Test
iii. Davidson, Godfrey and Mackinnon Omitted variables Test iv. Wu test based on Recursive Residuals
v. Cross-Validation method for linear model selection vi. The Baye’s Factor of Model Selection etc.,
CRITERION FOR SELECITON BETWEEN TWO LINEAR STATISTICAL MODLES
In the present study a criterion has been suggested for selection between two linear statistical models. Consider the two linear models as
M1 : 1 1 1 1 1
1 1
1
X n k k n
n
Y
M11 : 2 2 1 2 1
2 2
1
X n k k n
n
Y
Such that
In
2 1 0, N ~
When
2= 0, MIII reduces to M1and when
1= 0, MIII reduces to MIIunder the proposed criterion, we consider the following two tests:
(a) H0: Yn1 X11 1
H1: Y X11 X22 To test Ho, the F-statistic is given by
F=
1 2
, 2 ~ 2
ˆ 2
2 2 1 2 1
k k n k F k
k k n
[3.4]
(b) H0: Y X22 2
H1: Y X11 X22 To test Ho, the F-statistic is given by
F=
1 2
, 1 ~ 2
ˆ 2
2 2 1 2 1
k k n k F k
k k n
[3.5]
CONCLUSIONS
In the present study an attempt has been made by suggesting different criteria for Model specification of Regressors and Selection between non-nested linear statistical models. Besides these criteria, the various selection techniques have been presented in the study.
References:
[1] Amemiya, T. (1980), “Selection of Regressors”, International Economic Review, 21, 331-354.
[2] Ashok Chandra, K. (2007). “Criteria for selection of Regression In Econometrics”, unpublished Ph.D., thesis, S.V. University, Tirupati.
[3] Banerje, A.N., and Magnus, J.R. (2000), “On the Sensitivity of the Usual t-and F-tests to Covariance Mis-specification”, Journal of Econometrics, 95, 157-176.
[4] Bera A.K., and McAleer, M. (1989), “Nested and Non-Procedures for Testing Linear and Log-Linear Regression Models”, Sankhya : Series B, 51, 212-224.
[5] Davidson, R., and Mackinnon, (1981), “Several Tests for Model Specification in the Presence of Alternative Hypotheses”, Econometrics, 49, 781-793.
[6] George E.I., (2000), “The Variable Selection Problem”, JASA, 95, 1304-1308.
[7] Giri, D. (2006), “Some New Selection Techniques For Linear Statistical Models”, unpublished Ph.D., thesis, S.V. University, Tirupati.
[8] Gorman, J.W., and Toman, R.J. (1966), “Selection of Variables for Fitting Equations to Data”, Technometrics, 8, 27-51. [9] Hocking, R.R. (1976). “The Analysis and Selection of Variables in Linear Regression”, Biometrics, 32, 1-49.
[10] Lindley. D.V. (1968), “The Choice of Variables in Multiple Regression”, JRSS, Series-B, 30, 31-53. [11] Mallows, C.L. (1973), “Some Comments on Cp”, Technometrics, 15, 661-676.
[12] Miller, A.J. (1984), “Selection of Subsets of Regression Variables”, JRSS, Series-A, 147, 389-425. [13] Miller, A.J. (1990), “Subsets Selection in Regression”, London: Chapman and Hall.
[14] Nafeez Umar, S. (2004), “Statistical Inference on Model Specification in Econometrics”, unpublished Ph.D., thesis, S.V. University, Tirupati.
[15] Ramsey, J.B. (1969), “Tests for Specification Errors in Classical Linear Least Squares Analysis”, JRSS, Series B, 31, 350-371. [16] Thomson, M.L., (1978), “Selection of Variables in Multiple Regression : Part-I. A Review and Evaluation”, International
Statistical Review, 46, 1-19.
[17] Thomson M.L. (1978), “Selection of variables Regression: part-II. Chosen Procedures Computations and Examples”, International Statistical Review, 46, 129-148.
[18] Thursby, J.G. (1979), “Alternative Specification Error Tests: A Comparative Study”, JASA, 74, 222-225.
[19] Thursby, J.G. (1982), “Mis-specification, Heteroscedasticity, and the Chow and Glodfeld-Quandt Tests”, Review of Economics and Statistics, 64, 314-321.