Algorithm for proposed Adaptive robust M-estimator:

Rousseeuw and Leroy (1987) carried out a simulation study investigating a one-step re-weighted least squares based on the LMS with cut-off value ₀ 2.5to construct the weight. Gervini and Yohai (2002) used them as the initial values for the adaptive one-step weighted LS, it is also called as Robust and Efficient Weighted Least Squares estimators (REWLSE). The cut-off value used in weight function in REWLSE is adaptively calculated based on empirical distribution function (EDF) of the standardized absolute residuals so that REWLSE can attain full asymptotic efficiency under the normal error distribution and at the same time have enough breakdown point and small maximum bias.

The adaptive robust M-estimator proposed in this chapter is motivated by and extended from the Robust and Efficient Weighted Least Squares estimator (REWLSE). As in REWLSE, the empirical distribution function (EDF) of the standardized absolute residuals is used to derive the adaptive tuning constant in proposed adaptive approach in this chapter. The differences are: (1) The Tail weight index TWI() and Anderson-Darling (AD) statistics are involved in the adaptive scheme to get a better initial tuning constant to replace the fixed initial tuning constant used in REWLSE. (2) The weights are generated through the adaptive tuning constant, the(x) function, and the standardized absolute residuals from previous iteration. (3) The standard Iteratively Re-weighted Least Squares approach (multiple-step procedure) is applied to achieve better asymptotic efficiency instead of one-step REWLSE.

For an initial robust estimator LTS is preferred for its high breakdown point in the adaptive IRWLS procedure. The adaptive cutoff tuning constant is defined in each step of IRWLS. The initial robust estimator LTS is denoted as⁽⁰⁾. At thek iteration, let the empirical distribution ^th

) the default scale estimate at the k iteration. ^th

In order to identify outliers, one intuitive idea is to compare the empirical distribution function ) where x~, the standard normal distribution function.

IfF_n^(t)F_null^ (t), the sample proportion of the standardized absolute residuals that exceed t is greater than the assumed theoretical proportion. If this happens for a relatively large t, it means that outliers are probably present in the sample. For convenience, let |r|₍₁₎|r|₍₂₎...|r|₍_n₎ are Rousseeuw and Leroy (1987) carried out a simulation study investigating a one-step re-weighted least squares based on the LMS with ₀ 2.5to construct the weight. Gervini and Yohai (2002) used them as the initial values for the adaptive one-step weighted LS, it is also called as REWLSE (Robust and Efficient Weighted Least Squares estimators).

In the approach we proposed here, in each iteration, a screening Anderson-Darling (AD) extreme small portion as sample size goes very large. Some paper (O’Gorman, 2001) uses bigger cutoff points for the tail portion in calculation of the TWI().

How the tail proportions are chosen and how the threshold values are derived is tabulated in the following table 4.1 for some sample sizes (Refer to Appendix (E) for more detail).

Table 4.1

20 0.056 ^1.680 3.216

40 0.028 ^1.526 3.355

60 0.019 ^1.456 3.425

80 0.014 ^1.413 3.472

100 0.011 ^1.384 3.508

120 0.010 ^1.357 3.496

140 0.010 ^1.331 3.428

160 0.010 ^1.309 3.373

180 0.010 ^1.292 3.327

200 0.010 ^1.277 3.289

The logic to choose initial threshold ₀ is shown in the following Flow Chart Figure (4.1):

As a measure of the proportion of outliers, we define



Where {.}^ denotes the positive part. In the final sample, we have

 In the final sample, it is convenient to use the following formula:

}}

The equivalence of (4.8) and (4.9) can be proved by )} all the standardized absolute residuals are less than the initial cutoff ₀, the best choice would be





a

 for better asymptotic efficiency. The maximum cutoff can be set for the adaptive IRWLS. It is set as large value such as 11 in this approach. This procedure guarantees that

) ( ) ( i₀

i _a  and (_a ₀) (4.10)

Since it can be derived form (4.6) that

Consequently, from (4.8) we have

)

Where (x) is the score function derived from Tukey’s biweight M-estimator.

The algorithm (see appendix Figure G (2.1a)) is implemented with option (ADAPT=TW in program). It can be described as an IRWLS Algorithm with Dynamically adaptive tuning constant constant based on the EDF of the standardized absolute residuals (See appendix figure G (2.1a)).

This proposed adaptive procedure can be justified and summarized in this way:

Firstly, according to the Flow Chart Figure (4.1), the initial threshold ₀ is chosen based on both Anderson-Darling (AD) Normality test and tail weight index TWI calculation within each iteration. As the Anderson-Darling (AD) Normality test shows the Normality assumption is not seriously violated or the tail weight index TWI is not too large, then the initial cutoff is set to higher for better asymptotic efficiency; In the same way, as the Anderson-Darling (AD) Normality test shows the Normality assumption is possibly seriously violated or the tail weight index TWI is too large, then the initial cutoff is set to lower for higher breakdown point to

Secondly, The potential outliers, whose weights are set to zero by

) ( ) ( )

( | |

)

| ) |(

| (|

i a i

i r

r r

w  

 , not

only require that |r|₍_i₎₀, but also |r|₍_i₎_a. The adaptive cutoff _a tends to be large as the error assumption is approximately valid, that leads to better asymptotic efficiency than the M-estimator with fixed tuning constant ₀; whereas The adaptive cutoff _a tends to be close to fixed tuning constant ₀ as the error assumption is possibly violated, that leads to keeping high breakdown point for the M-estimator. In summary, the resulting tuning constant tends to produce the corresponding weights to down-weight influential residuals to achieve an adaptive fit.

In comparison to the non-adaptive M-estimators, the adaptive M-estimator based on empirical distribution function of the standardized absolute residuals with TWI and Anderson-Darling test is achieved by the same IRWLS algorithm which is used in the non-adaptive M-estimators except that the tuning constant is adaptively chosen by the proposed procedure above within each step in IRWLS, whereas the tuning constant has to be specified first in the non-adaptive M-estimators. A little extra computation time for TWI and Normality test is needed at the cost of achieving better robustness. It needs a little bit more time than the non-adaptive M-estimators, but it is a good time-robustness tradeoff.

In document Adaptive Robust Regression Approaches in data analysis and their Applications (Page 96-102)