Studying the Effect of Some Variables on the economic Growth Using Latent Roots Method

(1)

Studying the Effect of Some Variables on the economic Growth Using

Latent Roots Method

Mowafaq Muhammed Al-kassab Adnan M.H. Al-sinjary Dilnas S. Younis Department of Mathematics Education Department of Statistics & Informatics Faculty of Education College of Mathematics & Computer Sciences Tishk International University Mosul University

[email protected] [email protected] [email protected]

Abstract

Different kinds of estimators have been proposed as an alternative to the ordinary least squares for estimating the coefficients of the multiple linear regression model in the presence of multicollinearity. We estimated the parameters of this linear model by two methods: the least squares and the latent roots method. A comparison between these two methods is given through the application of the economic growth data of the UAE to study the effect of the population size, exchange rate, total exports and the total imports on the economic growth. It is shown that all the explanatory variables using the latent roots method have an effect on the economic growth and this effect is significant, whereas these variables are not significant using the least squares method.

Keywords

Regression; Multicollinearity; Least Squares; Correlation Matrix; Eigen Values; Eigen Vectors; Latent Roots

1. Multiple Linear Regression Model

The study of any particular phenomenon requires the identification of the factors influencing this phenomenon and the formulation of the relationship between these factors in the form of a model that expresses them. This model may be represented by one or several equations. In terms of a single equation, it may be simple or may be multiple. Common forms of use include a linear one that takes a mathematical form in writing and including more than one explanatory variable. This model will be used in this research and the general formula for this model is (Yan & Gang Su, 2009):

(2)

Where yi is the value of the response variable, i=1,2,3,…,n; n is the size of

sample,βj the regression parameters, j=0,1,2,…,p, p is the number of explanatory

variables , xij is the explanatory variable j for observation i, and ui is the random

error of observation i.

The above equation can be expressed in matrix form as:

y = Xβ + u …. 1

Where:

y is (n×1) vector of observations of the response variable.

X: is (n×k); k=p+1, matrix of observations of the explanatory variables whose first column contains the values of one.

βis (k×1) vector of the parameters to be estimated. uis (n×1) vector of random errors.

In order to estimate the parameters of the model and to ensure that the estimations have desirable properties, there are certain hypotheses that must be met. These assumptions are (Chatterjee & Price, 2000):

1. The variable y is determined as a linear structure of independent variables, X and a random variable u.

2. The variance of the random error u is σ2_{, and}

cov ui,uj = 0 for i ≠ j Where i,j = 1,2,3,…,n

3. The random error is normal with mean zero and variance σ2_.

4. The matrix X is distributed independently from u, so the covariance is zero, i.e.: cov ui,Xj = 0 Where i = 1,2,3,…,n ,j=1,2,3,…,p

5. This assumption is concerned with the rank of matrix X where we assume that: Rank X = p

This assumption includes:

- No multicollinearity between the explanatory variables.

2. Least Squares Method

The least squares estimate of the regression parameters in this method are:

β

OLS = X

'_X −1 _X'_y _{…. 2}

Here are the properties of this method:

1. Linearity: the estimated parameters in this method are linear in terms of the response variables:

β

OLS = X

'_X −1 _X'_{y = X}'_X −1_X' _y

(3)

E β

OLS = β

3. Variance: The variance of the estimated parameters is minimum ,where Var β

OLS = X

'_X −1_σ2 _{…. 3}

andσ2 ₌ y'y−βOLS '

X'y n−p−1

3.The Concept of the Multicollinearity in the Regression Model

Multicollinearity is one of the problems that occur in many cases due to the existence of a relationship between the explanatory variables. The existence of the complete multicollinearity between the variables leads to making the matrix X'_X _{not of full rank, ie, its determinant is zero. Thus, it is difficult to find the}

inverse of this matrix, which means that the regression parameters can not be estimated using the Least Squares method. The existence of an incomplete but powerful multicollinearity leads to the amplification of the variance and inaccurate capabilities (Dounald, 1987) and (Chatterjee et al., 2000).

4.Detecting Multicollinearity in the Regression Model

Multicollinearity between the explanatory variables can be detected by many methods (Draper & Smith, 1981) and (Fisher, 1981&Mason) and (Gunst& Mason, 1980):

1. The correlation coefficients matrix: Notice that the elements are not diagonal in the correlation matrix(X'_X_{), it is a simple indicator of inference on the existence}

of multicollinearity between explanatory variables, where the greater the value of these elements than zero indicates the existence of relations between these variables, whether those relations are positive or negative.

2. Determinant of matrix: The determinant value of the matrix ranges from zero to one , when it is zero ( X'_{X = 0}_{) this indicates the existence of complete}

multicollinearity. On the other hand if it is one ( X'_{X = 1}_{)this indicates the}

nonexistence of multicollinearity. When the value of the determinant is between zero and one there is a multicollinearity of certain degree, that is when (0 X'_X _{1) ,the multicollinearity exists and increases whenever the}

determinant close to zero(Mason, 1986). X'_X _{can be calculated using its latent}

values and through the following equation:

X'_{X =} i=1

p

(4)

Where λi is latent values for (X'X) matrix, when there is a multicollinearity a

number of latent values will be small or close to zero leading to a small value for the matrix determinant.

3. Latent values for (X'_X_{) matrix: The calculation of the latent values and vectors}

is a multicollinearity detection method, if the value of one of the latent values is equal to zero this indicates the presence of complete multicollinearity which indicates the difficulty or impossibility of finding the estimators of the Least Squares. Latent values are also used to calculate two types of auxiliary statistics in the detection of multicollinearity, they are (Kutner et al., 2005 ):

 Condition index (CI): It is calculated according to the relationship:

CIj = λmax_λ

j ;j = 1,2,…,p

Where:

λmax Is the maximum latent value.

λj th jth latent value.

 Condition number: It is calculated according to the relationship:

CN = λ_λmax

min

Where:

λmin Is the minimum latent value.

(Belsley et al., 1980) proposed that if CN values between 30 and 100 this was a sign of very high multicollinearity.

4. Solution of Multicollinearity

There are several methods proposed to minimize the effect of multicollinearity:

1. Delete the explanatory variables that are associated with other variables in order to get rid of the effects of this link and this deletion process according to certain criteria proposed to delete the specific variables.

2. Add new data to the original data. 3. Use biased estimation methods.

5. Latent Roots Method

This method was proposed in 1973 by Hawkins, the idea of this method is to find the latent roots of the correlation matrix and then to exclude the roots that are not important in the prediction process. The following is a detailed explanation of this method (Mason, 1986):

(5)

R = A'_A

Where:

A: Is the standardized information matrix which contains the standardized values of the response variable and the standardized values of the explanatory variables:

A = y X

Where

y_i = yi− y

i=1 n _y

i − y 2

; X_ik = xik− xk

i=1 n _x

ik − xk 2

R: The Correlation matrix between all variable, it is defined as follows:

R =

1 ry1 ry2 ry3 … ryp

r1y 1 r12 r13 … r1p

r2y

r3y

⋮ r_py

r21

r₃₁ ⋮ rp1

1 r32

⋮ rp2

r23 … r2p

1 ⋮ rp3

… r3p

⋱ … 1⋮

Latent roots: latent values and latent vectors, are obtained according to the following formula:

Λ =

λ0 0 … 0

0 λ1 … 0

⋮

0 0⋮ ⋯⋱ λ⋮

Γ =

γ00 γ01 … γ0p

γ10 γ11 … γ1p

⋮ γp0

⋮ γp1

⋱

⋯ γ⋮pp

= γ_γ00 γ01 … γ0p

0 γ1

… γ

p

The estimation of the regression parameters vector by Least Squares Method that based on the latent roots, are as follows:

β

OLS =− j=0 p γ0jγj

λj

j=0 p γoj2

λj

…. 4

To find the Latent Root Estimators, all the values and vectors that are not significant in the prediction are deleted from the equation (4), the roots that meet the following conditions are deleted:

λ_j < 1 h γ0j < 0. = 0,1,2,…,

(6)

β

LR =− j=0 q γ0jγj

λj

j=0 q γoj2

λj

….

Latent Root estimators have the following properties: 1. Bias: The Latent Root estimator is biased and its bias is:

Bais β

LR =

j=q+1 p γ0jγj

λj

j=q+1 p γoj2

λj

…. 6

2. The variance: The variance of the Least Squares estimators in terms of latent roots is:

Var β

OLS = σOLS 2

j=0 p _γ

jγj '

λ_j −

j=0 p γ0jγj

λj j=0

p γ0jγj '

λj

j=0 p γoj2

λj

…. 7

The variance of the Latent Root estimators is:

Var β

LR = σLR 2

j=0 q _γ

jγj '

λ_j −

j=0 q γ0jγj

λj j=0

q γ0jγj '

λj

j=0 q γoj2

λj

…. 8

The variance of the ith estimator is:

Var β

iLR = σLR 2

j=0 q

γ_ij2

λj −

j=0 q γ0jγij

λj 2

j=0 q γij2

λj

(7)

MSE β

LR = σLR 2

i=1 p

j=0 q

γ_ij2

λj −

j=0 q γ0jγij

λj 2

j=0 q γ0j2

λj

+

i=1 p

j=q+1 p γ0jγij

λj

j=q+1 p γoj2

λj 2

Basilevsky in 1994 suggested an approximation of the mean squares error for the Latent Roots estimator as:

MSE β

LR σLR 2

i=1 p

j=0 q

γ_ij2

λj −

j=0 q γ0jγij

λj 2

j=0 q γ0j2

λj

+ γ

0 '_β

LR 2

…. 9

6. Application Part

A comparison between the least squares and the latent roots regression methods is given through the application of the economic growth data of the UAE (Alsaffar, 2016). The data represent the Economic growth (Y) and four explanatory variables for the period (1999-2008). The explanatory variables are: (X1) Population size;(X2) Exchange rate; (X3) Total export; (X4) Total imports.

7.1 The correlation matrix is given in table (1) below:

Table (1): The simple correlation coefficient between the explanatory variables and the independent variable

Y X1 X2 X3 X4

Y 1 0.9461 0.9942 0.99 0.9847

X1 0.9461 1 0.9504 0.9269 0.9057

X2 0.9942 0.9504 1 0.9954 0.9919

X3 0.99 0.9269 0.9954 1 0.9935

X4 0.9847 0.9057 0.9919 0.9935 1

[image:7.612.68.545.74.301.2]

(8)

                  4.8726 0 0 0 0 0 0.1122 0 0 0 0 0 0.0097 0 0 0 0 0 0.0052 0 0 0 0 0 0.0003                   0.4477 0.4301 0.3179 -0.564 -0.4421 -0.4504 0.2633 0.2107 -0.8093 0.1686 -0.4527 0.0717 0.1929 -0.1474 -0.855 0.4338 0.8582 -0.1744 -0.0505 -0.2058 -0.4512 0.0636 0.8871 0.0518 -0.053

-7.3 The following table (table 2) gives the values of the latent roots and vectors for this data set. It also checks the multicollinearity between the variables according to the following conditions:

λj < 1 h γ0j < 0. = 0,1,2,…,

Table (2): Test results

i λt γ0i Conditions

0 0.0003 -0.053 Two holds

1 0.0052 -0.0518 Two holds

2 0.0097 0.8871 One holds

3 0.1122 0.0636 Two holds

4 4.8726 0.4512 One holds

The Latent Root estimators and its variances will depend only on the remaining two values i.e. q = 2.

7.4 The values of the regression parameters estimated using Ordinary Least Squares equation (2) are:

Table (3): Estimators, variances and the t- test values of the regression coefficients in the Least Squares

i __iOLS _V(__iOLS₎ _t(__iOLS₎

1 -0.2037 0.3087 -0.3667(N.S)

2 1.7528 4.4836 0.8278(N.S)

3 -0.0104 0.4892 -0.0148(N.S)

[image:8.612.72.309.72.259.2] [image:8.612.73.544.357.599.2]

(9)

We see from the above table that all variables are not significant. The values of the regression parameters estimated using Latent Roots method equation (5) are:

Table (4): Estimators, variances and the t- test values of the regression coefficients in the Latent Root

i __iLR _V(__iLR₎ _t(__iLR₎

1 0.196 0.0001386 16.6482

2 0.2168 0.000154 17.4742

3 0.2369 0.0001578 18.8614

4 0.3577 0.0001885 26.0558

We see from the above table that all variables are significant.

7.5 MSE is estimated for the two methods: According to equation (4) and Latent Roots method according to equation (9):

Table (5): MSE for Ordinary Least Squares and the Latent Roots before deletion

The method

σ

Part1 Part2 MSE R2

Least Squares 0.0471 6.7401 0 6.7401 98.89%

Latent Roots 0.0497 0.0006388 0.0028 0.0034 98.76% The value of the MSE in the Latent Roots method is lower than that in the Least Squares method.

Ignoring the non-significant variables, we get:

Table (6): MSE for the Least Squares and Latent Roots after deletion

The method

σ

MSE R2

Least Squares There are not significant parameter

Latent Roots 0.0497 0.0034 98.76%

We conclude that the estimated model using the Latent Roots is better than the estimated model in the Least Squares taking in consideration the number of significant variables for both methods and the estimated regression equation in the Latent Roots method is:

(10)

References

1. Alsaffar,A.S.(2016),” Diagnosing and Detecting Extreme Values in Linear Models with Application”, M.sc. thesis, University of Mosul.

2. Bastlevsky, Alexander, (1994), “Statistical Factor Analysis and Related Methods”, John Wilay & Sons, INC.

3. Belsley, D. A., Kuh, E. and Welsch, R. E., (1980), “Regression Diagnostics Identifying Influential Data and Sources of Collinearity”, Wiley, New York.

4. Chatterjee, Samprit and Price, Bertram, (2000), “Regression Analysis by Example”,3rd edition, John Wiley and Sons.

5. Dounald F.M., (1978), “Multivariate statistical Methods”, 2nd _{Ed., Mc} Graw-Hill Book Company, Tokyo.

6. Draper N.R., and Smith H., (1981), “Applied Regression Analysis”, 2n Ed. John Wiley and Sons, Canada.

7. Fisher J.C., and Mason R.L., (1981), “The Analysis of Multicollinear Data in Criminology”, John Wiley and Sons.

8. Gunst R.F., and Mason R.L., (1980), “Regression Analysis and Its Application”, Marcel Dekker, New York, U.S.A.

9. Hawkins, D. M., (1973), “On the Investigation of Alternative Regression by Principal Component Analysis”, Applied Statistics, Vol. 22, No. 3, pp. 275-286. 10.Kutner, Michael H.; Nachtsheim, Christopher J. and Neter, John; Li, William,

(2005), “Applied Linear Statistical Models”, 5th edition, McGraw- Hill Irwin, New York, USA.

11.Mason R.L., (1986), “Latent Root Regression: A Biased Regression Methodology for Use with Collinear Predictor Variables”, Commun Statist theor. Math., 15 (9): 2663.