I A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression

(1)

www.ijrias.org Page 1

A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression

Mezbahur Rahman, Samah Al-thubaiti, Reid W Breitenfeldt, Jinzhu Jiang, Elliott B Light, Rebekka H Molenaar, Mohammad Shaha A Patwary, and Joshua A Wuollet

Minnesota State University, Mankato, MN 56001, USA Correspondence Author: Mezbahur Rahman Abstract: The Box-Cox transformation is a well known family of

power transformations that brings a set of data into agreement with the normality assumption of the residuals and hence the response variable of a postulated model in regression analysis. In this paper we use six different data sets to implement adaptive maximum likelihood Box-Cox transformation parameter estimation in regression analysis. In addition, we perform random permutation and Monte-Carlo simulation to investigate the performances of the adaptive method.

Key Words: Normality tests; Shapiro-Wilk W statistic.

I. INTRODUCTION

n regression analysis, often a key assumption regarding normality of the error variable and hence the response variable are violated. The commonly used remedy is the Box- Cox family of power transformations (Box and Cox (1964)).

The process is to select a parameter in the Box-Cox transformation which maximizes the normal likelihood using the data at hand and then apply regression analysis on the transformed response variable. In practice, the regression model parameters are usually estimated separately after the necessary Box-Cox power transformation parameter is selected.

In literature, the estimation procedures of the Box-Cox power transformation parameter are considered by many authors. The notable ones are the normal likelihood method of Box and Cox (1964), the robustified version of the normal likelihood method of Carroll (1980) and of Bickel and Doksum (1981), the transformation to symmetry method of Hinkley (1975), the quick estimate of Hinkley (1977) and of Taylor (1985). Lin and Vonesh (1989) constructed a nonlinear regression model which is used to estimate the transformation parameter such that the normal probability plot of the data on the transformed scale is as close to linearity as possible.

Following the footsteps of Box and Cox (1982) and Lin and Vonesh (1989), Halawa (1996) considered the power transformation parameter estimation procedure using an artificial regression model which gives the estimates with very small variabilities compared to the normal likelihood procedure. Halawa (1996) conducted an exhaustive comparative study with normal likelihood procedure. In that

study, he also considered estimation procedures of the location and the scale parameters in the likelihood.

Rahman (1999) introduced a method of estimating the Box-Cox power transformation parameter using maximization of the Shapiro-Wilk W (Shapiro and Wilk (1965)) statistic along with a comparison study of the normal likelihood method (Carroll (1980)), and of the artificial regression model method (Halawa (1996)). Rahman and Pearson (2008) showed that their adaptive procedure‟s performance is better than other alternatives in obtaining maximum likelihood estimation procedures. In this paper the estimation procedure for the Box-Cox power transformation parameter is considered using adaptive maximization of the normal likelihood along with the Newton-Raphson root finding method explicitly in multiple regression context.

The motivation of the paper is given in the following section. Shapiro-Wilk W statistic p-values are computed before and after implementation of the transformation parameter. Application data sets are used and comparisons are made using random permutations for normal and non-normal samples using Monte-Carlo simulations.

II. MOTIVATION

Often Box-Cox transformations are implemented for the dependent variable in regression analyses when residuals are found non-normal. Then models are fitted using the transformed data and if normality is checked for residuals again, often those are found non-normal again. To investigate this dilemma, several data sets are considered.

Recently, Rahman and Pearson (2008) showed that adaptive maximum likelihood performs better than two other ways of implementing maximization procedure. But in their simulations, mostly univariate data are considered. Here, we consider a multiple regression environment so that practical implementation in regression can be investigated more closely.

III. BOX-COX TRANSFORMATION

I

(2)

www.ijrias.org Page 2

Let Y

1,Y

2,⋯,Yn be a random sample of size n from a population whose functional form is unknown. Box and Cox (1964) suggested that if the transformation

Y∗=

 



Yλ−1 λ ; λ≠0 ln(Y); λ= 0

(1)

is performed on the data then Y∗ will have an approximate normal distribution with mean μ and variance σ2. In equation (1), λ is unknown and considered as the Box-Cox power transformation parameter and „ln‟ represents the natural logarithm.

IV. ADAPTIVE NEWTON-RAPHSON METHOD IN MULTIPLE REGRESSION

Let us consider Y=Xβ+ε to be a usual multiple regression model, where Y is an n×1 vector of dependent variable measurements, X is an n×p matrix of independent variable measurements, β is a p×1 vector of unknown regression parameters, and ε is an n×1 vector of error variable.

The least square estimate of β is ̂β=(X'X)−1X'Y.

Residuals are computed as e=Y−̂Y. If e does not follow normal distribution, Box-Cox transformation on Y is implemented as in (1). After applying the transformation, the density function of e in terms of y can be written as

where x' is the corresponding row of the X matrix for a given y.

For a fixed λ, the log-likelihood function ℓA=ℓ(x'β,σ2;y1,y

2,⋯,yn) , where the subscript “A”

stands for adaptive, can be written as

Equation (3) is maximized when the partial derivatives of (3) with respect to β and σ2 are equated to zero and the solution to the resulting system is found. This leads to solving the following two equations,

∂ℓ

∂β

|

β=̂β, σ2=̂σ2 = 0 ⟹̂β=(X’X)-1X’Y∗ (4)

and

∂ℓ

∂σ2

|

β=̂β, σ2=̂σ2 = 0 ⟹̂σ2

= 1

n

(

Y∗-X̂β ‟

) (

Y∗-X̂β . (5)

)

Then the pseudo log-likelihood ℓ∗=ℓ(λ;y1,y2,⋯,yn) can be written as

The steps for maximizing ℓ∗ using the Newton-Raphson method are as follows:

where P

i=y∗ i=

  

 

ŷλ



i−1

̂λ , Q

i=

∂y∗ i

∂λ =







̂λ(lnyi)ŷλ



i−ŷλ i+1

̂λ2 ,

Ui=ŷ∗=xi'̂β=x

i'(X'X)−1X'P i, and

̂λ(t+1)=̂λ(t)− h(̂λ(t)) h'(̂λ(t)

) ,(8) where

h'(̂λ)=−n

 

 ^

i=1 n

(

^Pi−Ui

) (

^Ri−Wi

)

+



i=1 n

(

^Qi−Vi

)

2



i=1 n

(

^Pi−Ui

)

2

 



−2

  

 

 

i=1 n

(

^Pⁱ^−Uⁱ

) (

^Qⁱ^−Vⁱ

)

2

 



 

 

i=1 n

(

^Pi−Ui

)

2 2 ,

(3)

www.ijrias.org Page 3

where Ri=

∂2y∗ i

∂λ2=







̂λ2(lnyi)2



ŷλ

i−2̂λ(lnyi)ŷλ i+2ŷλ

i−2

̂λ3 ,

Vi= ∂ŷ∗

∂λ =

∂xi'̂β

∂λ = xi‟(X‟X)-1X‟Q

i, and

Wi= ∂2ŷ∗

∂λ2=

∂2xi'̂β

∂λ2 = x

i‟(X‟X)-1X‟Ri.

The computation algorithm starts with an initial value of ̂λ (an obvious choice is 1) iteratively using (8) and then ̂β and ̂σ2 are obtained using (4) and (5) by substituting ̂λ for λ, if desired.

V. APPLICATION

We consider six different data sets to investigate Box-Cox transformation effects in linear regression models. For each data set, we decide on competing models after implementing usual model selection procedures. For each competing model, Shapiro-Wilk W' statistic and its p-value are computed for the residuals using the exclusive Monte-Carlo procedure given by Rahman and Pearson (2000). The Box-Cox transformation parameter is estimated using the adaptive method described in section 4. Then the Shapiro-Wilk W'

statistic and its p-value are computed for the residuals using the exclusive Monte- Carlo procedure given by Rahman and Pearson (2000) for the transformed data.

We performed simulations to investigate the effect of the estimate of λ using the adaptive method described in section 4. We considered 1000 random permutations of the dependent variable and computed the Shapiro-Wilk W'

statistic and p- values for transformed and non-transformed data. We recorded the number of increases in p-values after transformation indicating closer to normal compared to prior to transformation as given below in tables for respective data.

We also recorded the number of changes in decisions to accept or reject normality after transformation.

We also considered simulated residuals to generate dependent variable values using standard uniform, standard normal, standard exponential, and a mixture of normal distributions which represent a bimodal distribution and then computed similar results to what we did for random permutations. Abreviations used in tables below are as follows:

RR Reject prior transformation and reject after transformation

RA Reject prior transformation and accept after transformation

AR Accept prior transformation and reject after transformation

AA Accept prior transformation and accept after transformation

NI Number of p-values increased after transformation Note that in all rejections of normality, 10% level of significance is considered.

5.1. Jet Turbine Engine Thrust Data

Table 1: Jet Turbine Engine Thrust Data (Table B.13, Montgomery et al., 2012, Page 566)

y x1 x2 x3 x4 x5 x6 y x1 x2 x3 x4 x5 x6

4540 2140 20640 30250 205 1732 99 4330 2062 20500 30190 193 1748 101 4315 2016 20280 30010 195 1697 100 4119 1929 20050 29960 183 1713 100 4095 1905 19860 29780 184 1662 97 3891 1815 19680 29770 173 1684 100 3650 1675 18980 29330 164 1598 97 3467 1595 18890 29360 153 1624 99 3200 1474 18100 28960 144 1541 97 3045 1400 17870 28960 134 1569 100 4833 2239 20740 30083 216 1709 87 4411 2047 20540 30160 193 1746 99 4617 2120 20305 29831 206 1669 87 4203 1935 20160 29940 184 1714 99 4340 1990 19961 29604 196 1640 87 3968 1807 19750 29760 173 1679 99 3820 1702 18916 29088 171 1572 85 3531 1591 18890 29350 153 1621 99 3368 1487 18012 28675 149 1522 85 3074 1388 17870 28910 133 1561 99 4445 2107 20520 30120 195 1740 101 4350 2071 20460 30180 198 1729 102 4188 1973 20130 29920 190 1711 100 4128 1944 20010 29940 186 1692 101 3981 1864 19780 29720 180 1682 100 3940 1831 19640 29750 178 1667 101 3622 1674 19020 29370 161 1630 100 3480 1612 18710 29360 156 1609 101 3125 1440 18030 28940 139 1572 101 3064 1410 17780 28900 136 1552 101 4560 2165 20680 30160 208 1704 98 4402 2066 20520 30170 197 1758 100 4340 2048 20340 29960 199 1679 96 4180 1954 20150 29950 188 1729 99 4115 1916 19860 29710 187 1642 94 3973 1835 19750 29740 178 1690 99 3630 1658 18950 29250 164 1576 94 3530 1616 18850 29320 156 1616 99 3210 1489 18700 28890 145 1528 94 3080 1407 17910 28910 137 1569 100

(4)

www.ijrias.org Page 4

where, y: Thrust (mechanical force generated by the engines)

(lbs), x1: Primary speed of rotation (rpm), x2: Secondary speed of rotation (rpm), x

3: Fuel flow rate (lbs/min), x 4: Pressure (lbs/in2

), x5: Exhaust temperature (∘F), and x6: Ambient temperature at time of test (∘F)

After exhaustive analyses two different competing models are determined. Model 1 includes variables x1, x3, x4, x5, and x6. Model 2 includes variables x

1, x 3, x

5, and x

6. In both the models, intercept is included.

Table 2: Jet Turbine Engine Thrust Data Results

Model Prior Transformation Post Transformation

W'-statistic p-value λ-value W'-statistic p-value

Model 1 0.9882 0.9033 2.1962 0.9691 0.2512

Model 2 0.9764 0.4746 2.0293 0.9522 0.0898

Table 3: Jet Turbine Engine Thrust Data Simulation Results

Model Simulation Type RR RA AR AA NI

Model 1 Random Permutation 61 109 4 826 831

Model 1 Uniform (0,1) 0 179 0 821 576

Model 2 Uniform (0,1) 0 219 0 781 431

Model 1 Normal (0,1) 0 109 0 891 349

Model 2 Normal (0,1) 0 108 0 892 206

Model 1 Exponential (1) 0 910 0 90 973

Model 1 3

4N(0,1)+ 1 4N( 3

2, 1

3) 0 172 0 828 469

Model 2 3

4N(0,1)+ 1 4N( 3

2, 1

3) 0 188 0 812 345

5.2. Hald Cement Data

Table 4: Hald Cement Data (Table B.21, Montgomery et al., 2012, Page 573)

y x₁ x₂ x₃ x₄ y x₁ x₂ x₃ x₄ y x₁ x₂ x₃ x₄

78.5 7 26 6 60 74.3 1 29 15 52 104.3 11 56 8 20

87.6 11 31 8 47 95.9 7 52 6 33 109.2 11 55 9 22

102.7 3 71 17 6 72.5 1 31 22 44 93.1 2 54 18 22

115.9 21 47 4 26 83.8 1 40 23 34 113.3 11 66 9 12 109.4 10 68 8 12

where, y: heat evolved in calories/gram of cement, x1: percentage of tricalcium aluminate, x2: perentage of tetracalcium silicate, x

3: percentage of tetracalcium aluminoferite, x

4: percentage of dicalcium phosphate.

After exhaustive analyses two different competing models are determined. Model 1 includes variables x

1, x

2, and x 4. Model 2 includes variables x1 and x2. In both the models, intercept is included.

Table 5: Hald Cement Data Results

Model 1 0.9662 0.7554 0.0655 0.9737 0.8550

Model 2 0.9157 0.1930 1.2120 0.8942 0.1005

(5)

www.ijrias.org Page 5

Table 6: Hald Cement Data Simulation Results

Model 1 Model 2

Random Permutation Random Permutation

41 18

7 4

18 0

934 978

364 625 Model 1

Model 2

Uniform (0,1) Uniform (0,1)

76 0

0 59

924 15

0 926

3 595 Model 1

Model 2

Normal (0,1) Normal (0,1)

85 19

22 75

724 69

169 837

57 501 Model 1

Model 2

Exponential (1) Exponential (1)

232 168

91 241

583 43

94 548

142 717

Model 1 Model 2

3

4N(0,1)+ 1 4N( 3

2, 1 3) 3

4N(0,1)+ 1 4N( 3

2, 1 3)

76 21

30 79

698 58

196 842

62 535

5.3 Solar Thermal Energy Test Data

Table 7: Solar Thermal Energy Test Data (Table B.2, Montgomery et al., 2012, Page 555)

y x1 x2 x3 x4 x5 y x1 x2 x3 x4 x5

271.8 783.35 33.53 40.55 16.66 13.20 264.0 748.45 36.5 36.19 16.46 14.11 238.8 684.45 34.66 37.31 17.66 15.68 230.7 827.80 33.13 32.52 17.50 10.53 251.6 860.45 35.75 33.71 16.40 11.00 257.9 875.15 34.46 34.14 16.28 11.31 263.9 909.45 34.60 34.85 16.06 11.96 266.5 905.55 35.38 35.89 15.93 12.58 229.1 756.00 35.85 33.53 16.60 10.66 239.3 769.35 35.68 33.79 16.41 10.85 258.0 793.50 35.35 34.72 16.17 11.41 257.6 801.65 35.04 35.22 15.92 11.91 267.3 819.65 34.07 36.50 16.04 12.85 267.0 808.55 32.20 37.60 16.19 13.58 259.6 774.95 34.32 37.89 16.62 14.21 240.4 711.85 31.08 37.71 17.37 15.56 227.2 694.85 35.73 37 18.12 15.83 196.0 638.1 34.11 36.76 18.53 16.41 278.7 774.55 34.79 34.62 15.54 13.10 272.3 757.90 35.77 35.40 15.70 13.63 267.4 753.35 36.44 35.96 16.45 14.51 254.5 704.70 37.82 36.26 17.62 15.38 224.7 666.80 35.07 36.34 18.12 16.10 181.5 568.55 35.26 35.90 19.05 16.73 227.5 653.10 35.56 31.84 16.51 10.58 253.6 704.05 35.73 33.16 16.02 11.28 263.0 709.60 36.46 33.83 15.89 11.91 265.8 726.90 36.26 34.89 15.83 12.65 263.8 697.15 37.20 36.27 16.71 14.06

where, y: total heat flux (kwatts), x

1: insolation (watts/m2, x 2: position of focal point in east direction (inches), x

3: position of focal point in south direction (inches), x4: position of focal point in north direction (inches), and x5: times of day.

After exhaustive analyses two different competing models are determined. Model 1 includes variables x

1, x 2, x

3, x 4, and x5. Model 2 includes variables x1, x2, x3 and x4. Model 3 includes variables x2, x3 and x4. In all the models, intercept is included.

Table 8: Solar Thermal Energy Test Data Results

Model 1 0.9570 0.2449 8.1625 0.9364 0.0767

Model 2 0.9730 0.5524 9.0471 0.9384 0.0908

Model 3 0.9479 0.1521 7.6066 0.9598 0.2761

(6)

www.ijrias.org Page 6

Table 9: Solar Thermal Energy Test Data Simulation Results

Model 1 Uniform (0,1) 86 4 847 63 22

Model 2 Uniform (0,1) 83 30 607 280 106

Model 3 Uniform (0,1) 91 23 187 699 374

Model 1 Normal (0,1) 32 68 83 817 480

Model 2 Normal (0,1) 69 37 609 285 106

Model 3 Normal (0,1) 77 18 212 693 367

Model 1 3

4N(0,1)+ 1 4N( 3

2, 1

3) 41 78 82 799 483

Model 2 3

4N(0,1)+ 1 4N( 3

2, 1

3) 77 65 473 385 203

Model 3 3

4N(0,1)+ 1 4N( 3

2, 1

3) 143 30 151 676 397

5.4 Heat Treating Data

Table 10: Heat Treating Data (TABLE B.12, Montgomery et al., 2012, Page 565)

y x1 x2 x3 x4 x5 y x1 x2 x3 x4 x5

1650 0.58 1.10 0.25 0.90 0.013 1650 0.66 1.10 0.33 0.90 0.016 1650 0.66 1.10 0.33 0.90 0.015 1650 0.66 1.10 0.33 0.95 0.016 1600 0.66 1.15 0.33 1.00 0.015 1600 0.66 1.15 0.33 1.00 0.016 1650 0.66 1.10 0.50 0.80 0.014 1650 1.00 1.10 0.58 0.80 0.021 1650 1.17 1.10 0.58 0.80 0.018 1650 1.17 1.10 0.58 0.80 0.019 1650 1.17 1.10 0.58 0.90 0.021 1650 1.17 1.10 0.58 0.90 0.019 1650 1.17 1.15 0.58 0.90 0.021 1650 1.20 1.15 1.10 0.80 0.025 1650 2.00 1.15 1.00 0.80 0.025 1650 2.00 1.10 1.10 0.80 0.026 1650 2.20 1.10 1.10 0.80 0.024 1650 2.20 1.10 1.10 0.80 0.025 1650 2.20 1.15 1.10 0.80 0.024 1650 2.20 1.10 1.10 0.90 0.025 1650 2.20 1.10 1.10 0.90 0.027 1650 2.20 1.10 1.50 0.90 0.026 1650 3.00 1.15 1.50 0.80 0.029 1650 3.00 1.10 1.50 0.70 0.030 1650 3.00 1.10 1.50 0.75 0.028 1650 3.00 1.15 1.66 0.85 0.032 1650 3.33 1.10 1.50 0.80 0.033 1700 4.00 1.10 1.50 0.70 0.039 1650 4.00 1.10 1.50 0.70 0.040 1650 4.00 1.15 1.50 0.85 0.035 1700 12.5 1.00 1.50 0.70 0.056 1700 18.5 1.00 1.50 0.70 0.068

where, y: Result of the pitch carbon analysis test, x1: Furnace Temperature, x

2: Duration of the carburizing cycle, x 3: Carbon Concentration, x

4: Duration of the diffuse cycle, and x5: Carbon concentration of the diffuse cycle.

After exhaustive analyses two different competing models are determined. Model 1 includes variables x1, x2, x3, x4, and x5. Model 2 includes variables x2 and x4. In all the models, intercept is included.

Table 11: Heat Treating Data Results

Model 1 0.9499 0.1265 0.6728 0.9696 0.4090

Model 2 0.9405 0.0748 0.6562 0.9676 0.3683

(7)

www.ijrias.org Page 7

Table 12: Heat Treating Data Simulation Results

Model 1 Uniform (0,1) 122 0 878 0 0

Model 2 Uniform (0,1) 219 8 754 19 15

Model 1 Normal (0,1) 92 8 841 59 33

Model 2 Normal (0,1) 73 14 605 308 162

Model 1 3

4N(0,1)+ 1 4N( 3

2, 1

3) 103 26 699 172 93

Model 2 3

4N(0,1)+ 1 4N( 3

2, 1

3) 145 52 534 269 231

5.5 Oil Extraction from Peanuts Data

Table 13: Oil Extraction from Peanuts Data (TABLE B.7, Montgomery et al., 2012, Page 560)

x1 x2 x3 x4 x5 y x1 x2 x3 x4 x5 y

415 25 5 40 1.28 63 550 25 5 40 4.05 21

415 95 5 40 4.05 36 550 95 5 40 1.28 99

415 25 15 40 4.05 24 550 25 15 40 1.28 66

415 95 15 40 1.28 71 550 95 15 40 4.05 54

415 25 5 60 4.05 23 550 25 5 60 1.28 74

415 95 5 60 1.28 80 550 95 5 60 4.05 33

415 25 15 60 1.28 63 550 25 15 60 4.05 21

415 95 15 60 4.05 44 550 95 15 60 1.28 96

where, y: total oil yield, x1: CO2 pressure (bar), x2: CO2 temperature (in degress Celsius), x

3: Peanut moisture (percent by weight), x4: CO2 flow rate (L/min), and x5: peanut particle size (mm).

After exhaustive analyses four different competing models are determined. Model 1 includes variables x1, x2, x3, x4, and x5. Model 2 includes variables x

1, x

2, and x

5. In all the models, intercept is included.

Table 14: Oil Extraction from Peanuts Data Results

Model 1 0.9430 0.3305 0.6730 0.9544 0.4731

Model 2 0.9452 0.3539 0.4953 0.9729 0.8072

(8)

www.ijrias.org Page 8

Table 15: Oil Extraction from Peanuts Data Simulation Results

Model 1 Uniform (0,1) 61 0 939 0 0

Model 2 Uniform (0,1) 54 21 594 331 88

Model 1 Normal (0,1) 74 0 924 2 2

Model 2 Normal (0,1) 15 76 39 870 545

Model 1 3

4N(0,1)+ 1 4N( 3

2, 1

3) 77 1 915 7 4

Model 2 3

4N(0,1)+ 1 4N( 3

2, 1

3) 16 78 41 865 559

5.6 Electronic Inverter Data

Table 16: Electronic Inverter Data (TABLE B.14, Montgomery et al., 2012, Page 567)

x1 x2 x3 x4 x5 y x1 x2 x3 x4 x5 y

3 3 3 3 0 0.787 8 30 8 8 0 0.293

3 6 6 6 0 1.710 4 4 4 12 0 0.203

8 7 6 5 0 0.806 10 20 5 5 0 4.713

8 6 3 3 25 0.607 6 24 4 4 25 9.107

4 10 12 4 25 9.210 16 12 8 4 25 1.365

3 10 8 8 25 4.554 8 3 3 3 25 0.293

3 6 3 3 50 2.252 3 8 8 3 50 9.167

4 8 4 8 50 0.694 5 2 2 2 50 0.379

2 2 2 3 50 0.485 10 15 3 3 50 3.345

15 6 2 3 50 0.208 15 6 2 3 75 0.201

10 4 3 3 75 0.329 3 8 2 2 75 4.966

6 6 6 4 75 1.362 2 3 8 6 75 1.515

3 3 8 8 75 0.751

where, y: transient point of PMOS-NMOS Inverters, x

1: width of the NMOS Device, x2: length of the NMOS Device, x3: width of the PMOS Device, x4: length of the PMOS Device, and x

5: a numeric vector.

After exhaustive analyses four different competing models are determined. Model 1 includes variables x

1, x 2, x

3, x 4, and x5. Model 2 includes variables x

1, x 2, x

3, and x

4. In all the models, intercept is included

Table 17: Electronic Inverter Data Results

Model 1 0.9292 0.0797 -0.1788 0.9031 0.0283

Model 2 0.9250 0.0685 -0.1759 0.9027 0.0250

(9)

www.ijrias.org Page 9

Table 18: Electronic Inverter Data Simulation Results

Model 1 Uniform (0,1) 73 0 927 0 0

Model 2 Uniform (0,1) 105 0 895 0 1

Model 1 Normal (0,1) 97 0 903 0 0

Model 2 Normal (0,1) 87 34 665 214 101

Model 1 3

4N(0,1)+ 1 4N( 3

2, 1

3) 134 0 866 0 0

Model 2 3

4N(0,1)+ 1 4N( 3

2, 1

3) 97 37 599 267 110

VI. CONCLUDING REMARKS

When we are implementing Box-Cox transformation, we are expecting that W'

values should be increasing and hence the p-values should also be increasing. For data set 2, model 3 of data set 3, data 4, and data 5, W'

values increase and hence the p-values also increase as expected. But for data set 1, models 1 and 2 of data set 3, and data set 6, the increment of W'

and p-values are reversed.

In random permutation computations, in all data sets, numbers of increases in p-values are high (at least 83%) except in data sets 2 and 5 where the increases are low (at most 64%). In data set 4, numbers of rejections are very high after transformation even though the numbers of increases in p-values are very high, which might be due to the fact that the level of significance is 10% and the changes are not high enough to change the decisions. In data sets 3 and 6, numbers of acceptance of normality are high after transformation as one might expect. In data sets 1, 2, and 5, mostly transformations were not needed. Note that in random permutations, we are permuting existing response measurements by fixing the design matrix.

In simulations, we are simulating random numbers as responses for fixed design matrices. Among all data sets, only in data set 1, results are as expected, that is, higher rates of rejection in cases of exponential errors, and then higher rate of acceptance after transformation. In other data sets, such comments cannot be made. In data set 6, there are high numbers of rejections which were acceptances prior to transformation.

In conclusion, we can summarize that in regression, results are very much dependent on the design matrix. So, in transformation parameter computations, design matrix should be taken into consideration. When Box-Cox transformation is considered, results are not dependable as one might expect.

More research should be done for effective implementation of the Box-Cox transformation.

BIBLIOGRAPHY

[1]. Bickel, P. J. and K. A. Doksum (1981). “An Analysis of Transformations Revisited.” Journal of the American Statistical Association, 76, 296-311.

[2]. Box, G. E. P. and D. R. Cox (1982). “An Analysis of Transformations Revisited (Rebutted).” Journal of the Ameran Statistical Association, 77, 209-210.

[3]. Box, G. E. P. and D. R. Cox (1964). “An Analysis of Transformations.” Journal of the Royal Statistical Society, Series B., 26, 211-252.

[4]. Carroll, R. J. (1980). “A Robust Method for Testing Transformations to Achieve Approximate Normality.” Journal of the Royal Statistical Society, Series B., 42, 71-78.

[5]. Halawa, A. M. (1996). “Estimating the Box-Cox Transformation via an Artificial Regression Model.” Communications in Statistics — Simulation and Computation, 25(2), 331-350.

[6]. Harter, H. L. (1961). “Expected Values of Normal Order Statistics.”

Biometrika, 48, 1 and 2, 151-165.

[7]. Hinkley, D. V. (1975). “On Power Transformation to Symmetry.”

Biometrika, 62, 101-111.

[8]. Hinkley, D. V. (1977). “On Quick Choice of Power Transformation.” Applied Statistics, 26, 67-68.

[9]. Lin, L. I. and E. F. Vonesh (1989). “An Empirical Nonlinear Data- Fitting Approach for Transforming Data to Normality.” American Statistician, 43, 237-243.

[10]. Montgomery, D. C., E. A. Peck, and G. G. Vining (2012).

Introduction to Linear Regression Analysis. John Wiley & Sons, Inc., New Jersey, USA.

[11]. Rahman, M. and L. M. Pearson (2008). “A Note on the Maximum Likelihood Box-Cox Transformation Parameter.” Journal of Probability and Statistical Science, 6(2), 155-168.

[12]. Rahman, M. and L. M. Pearson (2000). “Shapiro-Francia W‟

Statistic Using Exclusive Simulation”, Journal of the Korean Data

& Information Science Society, 11(2), 139-155.

[13]. Rahman, M. (1999). “Estimating the Box-Cox Transformation via Shapiro-Wilk W Statistic.” Communications in Statistics - Simulation and Computation, 28(1), 223-241.

[14]. Shapiro, S. S. and M. B. Wilk (1965). “An Analysis of Variance Test for Normality.” Biometrika, 52, 3 and 4, 591-611.

[15]. Shapiro, S. S., M. B. Wilk, and H. J. Chen (1968). “A Comparative Study of Various Tests of Normality.” Journal of the American Statistical Association, 63, 1343-1372.

[16]. Taylor, J. M. G. (1985). “Power Transformations to Symmetry.”

Annals of Mathemaical Statistics, 33, 1-67.

I A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression

www.ijrias.org Page 1

A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression

Mezbahur Rahman, Samah Al-thubaiti, Reid W Breitenfeldt, Jinzhu Jiang, Elliott B Light, Rebekka H Molenaar, Mohammad Shaha A Patwary, and Joshua A Wuollet

I

www.ijrias.org Page 2

 



|

|

(

) (

)

  

 











 

 

(

) (

)



(

)



(

)

 



  

 

 

(

) (

)

 



 

 

(

)

www.ijrias.org Page 3









www.ijrias.org Page 4

www.ijrias.org Page 5

www.ijrias.org Page 6

www.ijrias.org Page 7

www.ijrias.org Page 8

www.ijrias.org Page 9

 ^