# General Theory of Antithetic Time Series

N/A
N/A
Protected

Share "General Theory of Antithetic Time Series"

Copied!
16
0
0

Full text

(1)

http://dx.doi.org/10.4236/jamp.2015.312197

How to cite this paper: Ngnepieba, P. and Ridley, D. (2015) General Theory of Antithetic Time Series. Journal of Applied Mathematics and Physics, 3, 1726-1741. http://dx.doi.org/10.4236/jamp.2015.312197

1

### , Dennis Ridley

2,3*

1Department of Mathematics, Florida A&M University, Tallahassee, USA 2SBI, Florida A&M University, Tallahassee, USA

3Department of Scientific Computing, Florida State University, Tallahassee, USA

Received 6 November 2015; accepted 27 December 2015; published 30 December 2015

### Abstract

A generalized antithetic time series theory for exponentially derived antithetic random variables is developed. The correlation function between a generalized gamma distributed random variable and its pth exponent is derived. We prove that the correlation approaches minus one as the expo-nent approaches zero from the left and the shape parameter approaches infinity.

### Keywords

Antithetic Time Series Theory, Antithetic Random Variables, Bias Reduction, Gamma Distribution, Inverse Correlation, Serial Correlation

### 1. Introduction

Serially correlated random variables arise in ways that both benefit and bias mathematical models for science, engineering and economics. In one widespread example, a mathematical formula is used to create uniformly distributed pseudo random numbers for use in Monte Carlo simulation. The numbers are serially correlated be-cause they are generated by a formula. The benefit is that the same pseudo random numbers can be recreated at will, and two or more simulation experiments can be compared without regard to the pseudo random numbers. The correlation is designed to be very small so as not to bias the results of a simulation experiment. Still, some bias is unavoidable when using serially correlated numbers (Ferrenberg, Lanau, and Wong [1]).

Another wide spread example is a regression model in which the dependent variable is serially correlated. The result is biased model parameter estimates because the independence assumption of the Gauss-Markov theorem is violated (see Griliches [2], Nerlove [3], Koyck [4], Klein [5]). Similarly, the independence assumption of Fuller and Hasza [6] and Dufour [7] would not apply. The absence of any relevant information from a model will express itself in the patterns of the error term. If complete avoidance of bias requires normally distributed

(2)

data, then the absence of normality is like missing information. Bias may also be due to missing data points (Chandan and Jones [8], Li, Nychka and Amman [9]). Assume that a perfect model is postulated for a given application in which the population to which the data belong is known exactly. The model must be fitted to a sample of data, not the population. However, once the sample is taken, the distribution is automatically truncated and distorted, and the fitted model is biased. Regardless of the method of fitting, however small, sampling bias is unavoidable. One approach aimed at improving model performance is to combine the results from different models. For an extensive discussion and review of traditional combining see Bunn [10], Diebold [11], Clemen [12], Makridakis et al.[13], and Winkler [14].

Economics researchers have commented on serial correlation bias. Hendry and Mizon [15] and Hendry [16] considered common factor analysis (Mizon [17]) and suggested that serial correlation is a feature for re- presenting dynamic relationships in economic models. This in turn implies that economics allows for serial correlation (see Pindyck and Rubinfield [18]). Time domain methods for detecting the nature and presence of serial correlation were considered by Durbin and Watson [19] and Durbin [20]. Spectral methods were con- sidered by Hendry [16], Osborn [21] and Espasa [22]. Even if serial correlation can be a tool for studying the nature of economics, it is detrimental to long range forecasting models. Whatever the source of bias may be, the only possibility for long range forecasting is to completely eliminate the bias.

### 1.1. Background

Inversely correlated random numbers were suggested by Hammersley and Morton [23] for use in Monte Carlo computer simulation experiments. In that application, a single computer simulation is replaced by two simula- tions. One simulation uses uniformly distributed

### ( )

0,1 random numbers in r. The other simulation uses

### r

. The expectation is that the average of the results of these two simulations has a smaller variance than for either one. In practice, the variance sometimes decreases, but sometimes it increases. See also Kleijnen [24].

The theory of combining antithetic lognormally distributed random variables that contain negatively cor- related components was introduced by Ridley [25]. The Ridley [25] antithetic time series theorem states that “if

0, 1, 2, 3, t

X > t=  is a discrete realization of a lognormal stochastic process, such that lnXt ~N

, 2

### )

, then if the correlation between Xt and p

t

X is ρ, then

0 , 0

lim 1

p→−σ→+ρ

= − .” Antithetic variables can be com-

bined so as to eliminate bias in fitted values associated with any autoregressive time series model (see the Ridley [25] antithetic fitted function theorem, and antithetic fitted error variance function theorem). Similarly, antithe- tic forecasts obtained from a time series model can be combined so as to eliminate bias in the forecast error. Ridley [26] applied combined antithetic forecasting to a wide range of data distributions. Ridley [27] demon- strated the methodology for optimizing weights for combining antithetic forecasts. See also Ridley and Ngne- pieba [28] and Ridley, Ngnepieba, and Duke [29]. The antithetic variables proof in Ridley [25] was for the special case of Xt lognormally distributed.

The implication for using a biased mathematical model to investigate economic, engineering and scientific phenomena is that estimates obtained from the model are biased. Estimates of future values extrapolated from the model are also biased. As the forecast horizon increases, the bias accumulates and the extrapolations diverge from the actual values. This is most pronounced in the case of investigations into global warming phenomena. There, the horizon is by definition very far into the future. The smallest bias will accumulate, so much so that conclusions may be as much an artifact of the mathematical model as they are about climate dynamics. Com- bining antithetic extrapolations can dynamically remove the bias in the extrapolated values.

### 1.2. Proposed Research

The antithetic gamma variables discussed in this research are defined as follows.

Definition 1. Two random variables are antithetic if their correlation is negative. A bivariate collection of random variables is asymptotically antithetic if its limiting correlation approaches minus one asymptotically (see antithetic gamma variables theorem below).

Definition 2.

X

,t

### }

is an ensemble of random variables, where ξ belongs to a sample space and t belongs to an index set representing time, such that Xt is a discrete realization of a gamma stationary stochastic process from the ensemble, Xt

α β,

### )

, and X tt, =1, 2, 3, are serially correlated.

(3)

explain many other distributions. That is, a wide range of distributions can be represented by the gamma distribution. We will explore these possibilities by examining the correlation between X and p

X when X is gamma distributed. Of particular interest is the correlation between X and p

X as p→0−. A graph of the correlation between X and p

X as p→0+ and p→0− is shown in Figure 1. We begin by reviewing the obvious results for the case when p is positive. The correlation is positive when p is positive and exactly one when p is one. This is expected. As p moves away from one, the correlation decreases. As p approaches zero from the right, the correlation falls, albeit very slowly. This is also expected. In the case when p is negative, the correlation behaves quite differently. The result is entirely counterintuitive. As expected, the correlation is negative. But, unlike when p is positive, as p approaches zero from the left, the absolute value of the correlation increases. Furthermore, the actual correlation approaches minus one, not zero.

One purpose of this paper is to derive an analytical function for the correlation between X and p

X when X is gamma distributed. A second purpose is to explore by extensive computation, the behavior of the correlation as p approaches zero from the left. The trivial case of p equal zero where the correlation is zero, is of no interest. We are interested in p inside a delta neighborhood of zero, not zero. Finally, we prove that the limiting value of the correlation is minus one.

The paper is organized as follows. In Section 2 we review the gamma distribution. In Section 3 we derive the analytic function for the correlation. In Section 4 we prove its limiting value. In Section 5 we use MATLAB [30] to compute correlations for a wide range of values generated from the gamma distribution. In Section 6 we outline the method for using antithetic variables to dynamically remove bias from the fitted and forecast values obtained from a time series model. Examples include computer simulated data. Section 7 contains conclusions and suggestions for further research.

### 2. The Gamma Distribution

The gamma distribution is very important for technical reasons, since it is the parent of the exponential distribution and can explain many other distributions. Its probability distribution function (pdf) (see Hogg and Ledolter [31]) is:

; ,

1

### ( )

1exp , if 0

0, if 0,

X

X X

f X

X

α α

α β β α β

 

  

= Γ  

<

(1)

where α>0 is the shape parameter and β>0 is the scale parameter. The gamma function is defined as

### ( )

1

0 e d .

X

Xα X

α ∞ − −

[image:3.595.177.452.539.711.2]

Γ =

### ∫

(2) A graph of the gamma probability density function for β=0.6 and various values of α =1.1, 3, 5, 9,11 is shown in Figure 2.

(4)
[image:4.595.158.473.84.282.2]

Figure 2. Exploring the effect of varying parameter values in the pdf of the gamma

distribution.

### X

p

Let Xt>0, t=1, 2, 3,, be a discrete realization of a generalized gamma stochastic process. For p∈, from Appendix A, the pth moment of the gamma distribution is given by

.

p p

t

p

X

### α

Γ +

  =

  Γ

 (3)

Therefore, since Γ

α+ = Γ1

α α

, the mean is

1

, t

X

### α

Γ +

= =

Γ

the second moment is

2 2 2 2

1 , t

X

### α

Γ +

  = = +

  Γ

and the variance is

### )

2

2 2 2

.

t t

X X

σ = =αβ (4)

Let ρ be the correlation between Xt and p t

X . Then

1

1 1

2 2

Cov ,

,

Var Var Var Var

p p p

t t t t t

p p

t t t t

X X X X X

X X X X

ρ

+

 −  

   

= =   (5)

and

### )

2

2

Var Xtp =Xtp− Xtp .

Using Equation (3)

2

2 2

2 2

Var tp p p .

p p

X β α β α

α α

Γ +

Γ +

= −

Γ Γ (6)

(5)

### }

1 1 1 1

2 2 2 2

1 1

2 2 2

2

1 1 1

1 2

2 2 2 2 2

1 1 2 2 1 1 . 2 2 p p p p

p p p p

p p p p

p p

p p

p p

p p

α α α α

β αβ α

α α α α

ρ

α α α α

β β α β α

α α α α

α α

α

α α α α α

α α α α α α α α

α

+ Γ + + + Γ + Γ + + Γ +

Γ Γ Γ Γ

= =

Γ + Γ +  Γ + Γ +

 

   

Γ Γ   Γ Γ

   

   

   

Γ + + Γ +

Γ Γ Γ + + − Γ +

= =

Γ Γ + − Γ +

Γ + Γ − Γ +

Γ

(7)

Since Γ

α+ + =p 1

α+p

Γ α+p

### )

, Equation (7) becomes,

### }

1

1 2 2 2 . 2 p p p p

### α

Γ +

=

Γ Γ + − Γ +

(8)

From Equation (4)

1

2 σ

α β

= , and the correlation can be expressed in terms of

as

1

2 2 , 2 p p p p

### α

Γ +

=

Γ Γ + − Γ +

or

2

1 2 2

2 2 2

. 2 p p p p σ β β ρ

σ σ σ

σ

β β β

 

 

Γ   +  

 

=

 

       

Γ   Γ   + − Γ   +        

      

 

 

(9)

The gamma function Γ

### ( )

α results in a complex number when the argument is negative. This is avoided if

2

p<α . In any case, since we are only interested in p approaching zero from the left, this condition will always

be satisfied when α>0.

### 4. Antithetic Gamma Variables Theorem

Theorem 1. If Xt>0, t=1, 2, 3,, is a discrete realization of a generalized gamma stochastic process with shape parameter

### α

, then if ρ is the correlation between Xt and p

t X , then

### }

1

1

0 , 0 ,

2 2 2

lim lim 1.

2 p p p p p p α α

### α

− −

→ →∞ → →∞

Γ +

= = −

Γ Γ + − Γ +

See proof in Appendix B.

### p

The effect of p on the correlation is demonstrated by calculating the correlation coefficient from Equation (8) for various values of

### α

and β. The correlation coefficients are listed in Table 1 and plotted in Figure 3 and

Figure 4. From Figure 3 and Figure 4, for all values of

### α

, the correlation coefficient gets closer to

### 1

as 0

[image:5.595.80.545.75.481.2]
(6)
[image:6.595.141.492.83.300.2]

Figure 3. 0

lim

p→−ρ using Equation (8) for α=5,10,15, 20 and 25; p= −0.5 to −0.005.

Figure 4. 0 ,

lim

p→−α→∞ρ

using Equation (8).

that are more symmetrical. Also, as

### α

increases the standard deviation increases. This is indicative of greater spread about both sides of the mode of X. Equation (9) expresses the correlation in terms of

### σ

. In practice the value of

### α

will be that for the actual data under study. It cannot be modified. Still, one might say that it appears that the effect of p on reversing the correlation is greatest for symmetrical distributions.

To validate Equation (8), the MATLAB [30] random number generator GAMRND (

### α

, β , n) is used to generate n=10000 random numbers xt, from the gamma distribution in Equation (1) with parameters

25

α= , β=0.5. The correlation is estimated from the sample correlation coefficient (ρˆ ). One application of the correlation reversal is to remove bias in values extrapolated from a time series model (see Appendix C). The gamma distribution is immediately applicable when α≥25, such that ρ is approximately minus one. For

25

[image:6.595.141.489.331.573.2]
(7)

1 2 2 2

1 1 1

ˆ

n n n

p p p p

t t t t

t t t

x x x x x x x x

ρ

= = =

 

= − − − −

 

, where,

1 n

t t

x x n

=

=

,

1 n

p p

t t

x x n

=

=

### ∑

, t=1, 2,,n.

The results are shown in Table 2. The coefficients are almost identical to the theoretical values obtained from Equation (8) and listed in Table 1. In practice, the data may include relatively few observations. To investigate the small sample correlation coefficient, the correlation coefficient is calculated for n=100.

### 6. Bias Reduction

Consider an autoregressive time series xt of n discrete observations obtained from a gamma distribution with a large shape parameter to which a least squares model xt= Φxt1+t, t=2,,n is fitted. Let the fitted values be

ˆt

x . Next, consider the combined weighted average fitted values xˆc t,xˆt+ −

1 ω

### )

xˆt′,− ∞ ≤ ≤ ∞ω . The para- meter

### ω

is a combining weight. The fitted values xˆt and xˆt′ are antithetic in the sense that they contain compo- nents of error ˆt and ˆt′, respectively, that are biased and when weighted, ωˆt and

1−ω

### )

ˆt′ are perfectly nega- tively correlated. The antithetic component xˆt′ is estimated from ˆ ˆ ˆp

ˆ ˆp

ˆ ˆ

### )

, 1, 2, 3, ,

p p

t xx x x t

[image:7.595.87.540.297.722.2]

x′ = +x r s s xx t=  n,

Table 1. Behavior of ρ as p→0− using Equation (8) with various values of α.

0

p→ − α=5 α=10 α=15 α=10 α=25

0.5

p= − −0.8817 −0.9423 −0.9618 −0.9715 −0.9772

0.25

p= − −0.9204 −0.9605 −0.9737 −0.9803 −0.9843

0.1

p= − −0.9395 −0.9697 −0.9789 −0.9848 −0.9878

0.05

p= − −0.9452 −0.9724 −0.9816 −0.9862 −0.9888

0.025

p= − −0.9479 −0.9738 −0.9825 −0.9868 −0.9895

0.015

p= − −0.9490 −0.9743 −0.9828 −0.9871 −0.9897

0.005

p= − −0.9500 −0.9748 −0.9832 −0.9874 −0.9899

0.0025

p= − −0.9503 −0.9759 −0.9833 −0.9833 −0.9875

0.0013

p= − −0.9504 −0.9751 −0.9833 −0.9874 −0.9900

0.0005

p= − −0.9505 −0.9751 −0.9833 −0.9875 −0.9900

0.00025

p= − −0.9505 −0.9751 −0.9833 −0.9872 −0.9900

0.0001

p= − −0.9505 −0.9751 −0.9833 −0.9875 −0.9900

Table 2. Values of ρˆ for p→0−, α =25, β=0.5.

0

p→ − n=100 n=10000

0.5000

p= − −0.9785 −0.9772

0.4500

p= − −0.9799 −0.9802

0.3500

p= − −0.9826 −0.9816

0.3000

p= − −0.9839 −0.9830

0.2500

p= − −0.9851 −0.9842

0.2000

p= − −0.9863 −0.9855

0.1500

p= − −0.9874 −0.9867

0.1000

p= − −0.9888 −0.9879

0.0500

p= − −0.9905 −0.9890

0.0001

(8)

where the exponent of the power transformation is set to the small negative value p= −0.001, r denotes sample correlation coefficient and s denotes sample standard deviation (see Appendix C for an outline of how inverse correlation can be used to eliminate bias in xˆc t, ). The expectation is that if xˆt are biased, then xˆc t, will exhibit diminishing bias as p→0−. If xˆt are unbiased then ω=1 and the combined fitted values are just the original fitted values. The corresponding combined forecast values are xˆc n,

τ =ωxˆn

τ + −1 ω

### ) ( )

xˆn′ τ .

A shift parameter λ similar to that discussed by Box and Cox [32] is used to facilitate the power trans- formation and further improve the combined fitted mse. λ (determined by grid search) can be added to each value of xt to obtain zt= +xt λ prior to applying the power transformation and subtracted after conversion back to their original units, leaving the mean unchanged. While the data may be from stationary time series, they are of necessity a truncated sample. Any truncated data sample will fall short of the complete distributional properties of the population from which they are drawn, and therefore the property of stationary data. The

antithetic time series is rewritten and computed from

ˆ

### )

ˆˆ ˆ

ˆ

ˆ p p ˆ ˆ

p p

t zz z z t

x′ = +x ρ s s zz , t=1, 2, 3,,n, where

0.001

p= − ,

ˆˆ

ˆzzp

ρ is the sample correlation between ˆz and ˆp

z , where sˆz and

ˆp z

s are sample standard deviations in zˆt and ˆ

p t

z respectively, and where λ and

### ω

are chosen so as to minimize the combined fitted mse for xˆc t,xˆt+ −

1 ω

### )

xˆt′. The antithetic forecast values are computed from

ˆˆˆ

ˆ ˆ

### )

ˆ p p ˆ ˆ

p p

n zz z z n

x′ τ = +x ρ s s z τ −z , τ =1, 2, 3,,N.

Of the terms p, λ, and

### ω

, only p is unique to antithetic time series analysis, and it is not a fitted parameter. When implemented, p is actually a constant set to −0.001, an approximation of 0−. Also, since ˆρzzp = −1, the transformation involving p is linear, and does not imply that the original model should have been non-linear. Like the use of λ here, it is common practice to apply various transformations such as logarithm, square root and Box and Cox [32] that add no new information, but make the data better conform to the assumptions of a postulated model. If there were no bias, or if antithetic combining did not reduce bias, then

### ω

would simply be equal to one and the original postulated model only would apply.

### Computer Simulation

To illustrate, consider a model fitted to computer simulated data based on stationary autoregressive processes, containing 1060 observations generated from xt =yt+250+et, t=1, 2, 3,,1060, where to avoid initialization pro- blems, the first 250 values are dropped from yt=0.8yt1+et, t=1, 2, 3,,1310, y0 =0 and et

α β,

### )

,

0.6

β= , α =5,10,15, 20, 25 are obtained from MATLAB [30]. From the 1060 values, different models are fitted from the first 50, 51, , 60 values. Each model is used to forecast 1000 one-step-ahead forecast values corresponding to periods 51 - 1050, 52 - 1051, , 61 - 1060. This simple first order autoregressive model is chosen for its ease of understanding and transparency. It is perfect for the population from which the data are sampled. The sample sizes are typical of what can be expected in practice, and the outcomes from model fitting are subject to sampling bias.

The results are shown in Table 3. As

### α

increases, the fitted mse’s increase, indicative as expected, of the increase in the variance in the data. The combined fitted mse’s are all lower than the original fitted mse’s. The average gain is a reduction in fitted mse of 5.5%. This demonstrates that for a wide range of gamma distributions, combining antithetic fitted values can reduce the component of error that is due to systematic bias, leaving only random error. The fitted mse and 1000 period forecast horizon mse sensitivities to forecast origin (n) are shown in Table 4. As n increases from 51 to 60, the combined fitted mse’s are lower than the original fitted mse’s. The average gain is a reduction of 11.1%. The average gain in the combined forecast mse over the original forecast mse is a reduction of 6.9%. The forecast mse sensitivities to forecast horizon (N) are shown in

Table 5. As N increases from 100 to 700, the combined forecast mse's are lower than the original forecast mse’s. The average gain is a reduction of 6.1%.

### 7. Conclusion

(9)
[image:9.595.86.536.111.227.2]

Table 3. Fitted mean square error (mse) for gamma distributed autoregressive processes of length n=50, p= −0.001,

0.6

β= and various values of α.

α λ ω Fitted mse

Original Combined Reduction %

5 370 −46 1.377 1.209 12.2

10 520 −39 4.753 4.394 7.6

15 481 −17 6.892 6.732 2.3

20 575 −11 6.922 6.824 1.4

25 529 −43 9.686 9.304 3.9

[image:9.595.87.539.262.463.2]

Average 5.5

Table 4. Fitted mean square error (mse) and one thousand period forecast mean square error (mse) for gamma distributed

autoregressive processes of length n, p= −0.001, β=0.6, α=5.

Forecast origin n λ

ω Fitted mse Forecast mse

Original Combined Original Combined

50 370 −46 1.377 1.209 6.158 5.760

51 411 −46 1.351 1.185 6.230 5.407

52 252 −46 1.354 1.202 6.493 5.641

53 446 −39 1.397 1.266 6.071 5.943

54 310 −55 1.371 1.160 6.070 5.611

55 261 −37 1.356 1.254 5.941 5.549

56 465 −41 1.345 1.189 6.108 5.720

57 400 −39 1.331 1.218 6.234 5.592

58 507 −36 1.312 1.191 6.142 6.224

59 405 −32 1.290 1.175 6.044 5.687

60 354 −62 1.434 1.207 5.600 5.300

Average 1.356 1.205 6.099 5.676

Combined reduction % 11.1 6.9

Table 5. Forecast mean square error (mse) for gamma distributed autoregressive processes of length n=50, p= −0.001,

0.6

β= , α =5.

Forecast horizon N Forecast mse

Original Combined Reduction%

100 5.973 5.581 6.6

150 6.529 6.059 7.2

200 5.426 5.220 3.8

250 6.559 6.152 6.2

300 6.455 6.181 4.2

350 6.390 6.051 5.3

400 5.982 5.630 5.9

450 5.906 5.554 6.0

500 5.972 5.607 6.1

550 6.114 5.699 6.8

600 6.063 5.616 7.4

650 5.818 5.401 7.2

700 5.632 5.243 6.9

(10)

antithetic time series analysis can be generalized to all data distributions that are likely to occur in practice. The gamma distribution is unimodal. A suggestion for future research is to investigate the correlation between a random variable and its pth power when its distribution is multimodal. Another suggestion is to compare the effectiveness of the Hammersley and Morton [23] antithetic random numbers with antithetic random numbers constructed from the method described in this paper. Combining antithetic extrapolations can dynamically reduce bias due to model misspecifications such as serial correlation, non-normality or truncation of the dis- tribution due to data sampling. Removing bias will eliminate the divergence between the extrapolated and actual values. In the particular case of climate models, removing bias can reveal the true long range climate dynamics. This will be most useful in models designed to investigate the phenomenon of global warming. Beyond the examples discussed here, antithetic combining has broad implications for mathematical statistics, statistical process control, engineering and scientific modeling.

### Acknowledgements

The authors would like to thank Dennis Duke for probing questions and good discussions.

### References

[1] Ferrenberg, A.M., Lanau, D.P. and Wong, Y.J. (1992) Monte Carlo Simulations: Hidden Errors from Good Random Number Generators. Physical Review Letters, 69, 3382-3384. http://dx.doi.org/10.1103/PhysRevLett.69.3382

[2] Griliches, Z. (1961) A Note on Serial Correlation Bias in Estimates of Distributed Lags. Econometrica, 29, 65-73. http://dx.doi.org/10.2307/1907688

[3] Nerlove, M. (1958) Distributed Lags and Demand Analysis for Agricultural and Other Commodities. U.S.D.A Agri-cultural Handbook No. 141, Washington.

[4] Koyck, L.M. (1954) Distributed Lags and Investment Analysis. North-Holland Publishing Co., Amsterdam.

[5] Klein, L.R. (1958) The Estimation of Distributed Lags. Econometrica, 26, 553-565. http://dx.doi.org/10.2307/1907516 [6] Fuller, W.A. and Hasza, D.P. (1981) Properties of Predictors from Autoregressive Time Series. Journal of the

Ameri-can Statistical Association, 76, 155-161. http://dx.doi.org/10.1080/01621459.1981.10477622

[7] Dufour, J. (1985) Unbiasedness of Predictions from Estimated Vector Autoregressions. Econometric Theory, 1, 381- 402. http://dx.doi.org/10.1017/S0266466600011270

[8] Chandan, S. and Jones, P. (2005) Asymptotic Bias in the Linear Mixed Effects Model under Non-Ignorable Missing Data Mechanisms. Journal of the Royal Statistical Society: Series B, 67, 167-182.

http://dx.doi.org/10.1111/j.1467-9868.2005.00494.x

[9] Li, B., Nychka, D.W. and Ammann, C.M. (2010) The Value of Multiproxy Reconstruction of Past Climate. Journal of the American Statistical Association, 105, 883-911. http://dx.doi.org/10.1198/jasa.2010.ap09379

[10] Bunn, D.W. (1979) The Synthesis of Predictive Models in Marketing Research. Journal of Marketing Research, 16, 280-283. http://dx.doi.org/10.2307/3150692

[11] Diebold, F.X. (1989) Forecast Combination and Encompassing: Reconciling Two Divergent Literatures. International Journal of Forecasting, 5, 589-592. http://dx.doi.org/10.1016/0169-2070(89)90014-9

[12] Clemen, R.T. (1989) Combining Forecasts: A Review and Annotated Bibliography. International Journal of Forecast-ing, 5, 559-583. http://dx.doi.org/10.1016/0169-2070(89)90012-5

[13] Makridakis, S., Anderson, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., Newton, J., Parzen, E. and Wink-ler, R. (1982) The Accuracy of Extrapolation (Times Series) Methods: Results of a Forecasting Competition. Journal of Forecasting, 1, 111-153. http://dx.doi.org/10.1002/for.3980010202

[14] Winkler, R.L. (1989) Combining Forecasts: A Philosophical Basis and Some Current Issues. International Journal of Forecasting, 5, 605-609. http://dx.doi.org/10.1016/0169-2070(89)90018-6

[15] Hendry, D.F. and Mizon, G.E. (1978) Serial Correlation as a Convenient Simplification, Not a Nuisance: A Commen-tary on a Study of the Demand for Money by the Bank of England. The Economic Journal, 88, 549-563.

http://dx.doi.org/10.2307/2232053

[16] Hendry, D.F. (1976) The Structure of Simultaneous Equations Estimators. Journal of Econometrics, 4, 551-588. http://dx.doi.org/10.1016/0304-4076(76)90017-8

[17] Mizon, G.E. (1977)Model Selection Procedures.In: Artis, M.J. and Nobay, A.D., Eds., Studies in Modern Economic Analysis, Basil Blackwell, Oxford.

(11)

[19] Durbin, J. and Watson, G.S. (1950) Testing for Serial Correlation in Least Squares Regression: I. Biometrika, 37, 409-428.

[20] Durbin, J. (1970) Testing for Serial Correlation in Least-Squares Regression When Some of the Regressors Are Lagged Dependent Variables. Econometrica, 38, 410-421. http://dx.doi.org/10.2307/1909547

[21] Osborn, D.R. (1976) Maximum Likelihood Estimation of Moving Average Processes. Journal of Economic and Social Measurement, 5, 75-87.

[22] Espasa, D. (1977) The Spectral Maximum Likelihood Estimation of Econometric Models with Stationary Errors. 3, Applied Statistics and Economics Series. Vanderhoeck and Ruprecht, Gottingen.

[23] Hammersley, J.M. and Morton, K.W. (1956) A New Monte Carlo Technique: Antithetic Variates. Mathematical Pro-ceedings of the Cambridge Philosophical Society, 52, 449-475. http://dx.doi.org/10.1017/S0305004100031455 [24] Kleijnen, J.P.C. (1975) Antithetic Variates, Common Random Numbers and Optimal Computer Time Allocation in

Simulations. Management Science, 21, 1176-1185. http://dx.doi.org/10.1287/mnsc.21.10.1176

[25] Ridley, A.D. (1999) Optimal Antithetic Weights for Lognormal Time Series Forecasting. Computers & Operations Research, 26, 189-209. http://dx.doi.org/10.1016/s0305-0548(98)00058-6

[26] Ridley, A.D. (1995) Combining Global Antithetic Forecasts. International Transactions in Operational Research, 4,387-398. http://dx.doi.org/10.1111/j.1475-3995.1995.tb00030.x

[27] Ridley, A.D. (1997) Optimal Weights for Combining Antithetic Forecasts. Computers & Industrial Engineering, 2, 371-381. http://dx.doi.org/10.1016/s0360-8352(96)00296-3

[28] Ridley, A.D. and Ngnepieba, P. (2014) Antithetic Time Series Analysis and the CompanyX Data. Journal of the Royal Statistical Society: Series A, 177, 83-94. http://dx.doi.org/10.1111/j.1467-985x.2012.12001.x

[29] Ridley, A.D., Ngnepieba, P. and Duke, D. (2013) Parameter Optimization for Combining Lognormal Antithetic Time Series. European Journal of Mathematical Sciences, 2, 235-245.

[30] MATLAB (2008)Application Program Interface Reference, Version 8. The Math Works, Inc.

[31] Hogg, R.V. and Ledolter, J. (2010) Applied Statistics for Engineers and Physical Scientists. 3rd Edition, Prentice Hall, Upper Saddle River, 174.

[32] Box, G.E.P. and Cox, D.R. (1964) An Analysis of Transformations. Journal of the Royal Statistical Society: Series B, 26, 211-252.

[33] Abramowitz, M. and Stegun, I.A. (1964)Handbook of Mathematical Functions. Dover Publications, New York, 260 p.

[34] Bernado, J.M. (1976) Algorithm AS 103: Psi (Digamma) Function. Journal of the Royal Statistical Society: Series C

(Applied Statistics), 25, 315-317.

(12)

### th Order Moment for the Gamma Distribution

The pth moment of the gamma distribution is derived as follows:

1

### ( )

1

0 0

1 1

e d e d .

x x

p p p

t

X x α xα β x α xα β x

β α β α

− −

+ −

  = =

 

Γ Γ

### ∫

 (A.1)

Multiplying and dividing by

α+pΓ

+p

### )

, Equation (A.1) becomes

### )

1 0 1 0 1 e d 1

e d . x p p p t p x p p p p

X x x

p p x x p α α β α α α β α β α

β α β α

α β

α β α

+ + − + − ∞ + − + Γ +   =

  Γ Γ +

Γ +

=

Γ Γ +

(A.2)

Since

1

1

e ; ,

x p

p x f x p

p

α β

α

### α

− + −

+ Γ + = + , is the pdf for a gamma function with the parameter α+p,

### )

1

0

1

e d 1, x p

p x x

p α β α β α − ∞ + −

+ Γ + =

### ∫

and equation (A.2) becomes

.

p p

t

p

X

Γ +

  =

  Γ

### Appendix B: Proof of the Antithetic Gamma Variables Theorem

By applying the Taylor expansion around p=0 to Equation (8) we have

2

### ( )

3

, 2

p

p p O p

α α ′ α ′′ α

Γ + = Γ + Γ + Γ + (B.1)

where Γ′, Γ′′ are the first and second derivatives of

and O p

### ( )

3 represents the remainder.

2 2 2 2

2 3 3

2

p p p

p p O p

### α

′ ′

Γ + = Γ + Γ + Γ Γ

′′ ′ ′′

+ Γ Γ + Γ Γ + (B.2)

2

2

### ( )

3

2p 2p 2p O p

α α α α ′ α α ′′ α

Γ Γ + = Γ + Γ Γ + Γ Γ + (B.3)

The combination of equations (B.1)-(B.3) reduces Equation (8) to

### }

2 3 1 2 2 2 2 3 1 2 2 2 2 . p

p p O p

p p O p

p

p p O p

p p O p

α α α

ρ

α α α α α α

α α α

α α α α α α

′′

Γ + Γ + Γ +

 

 

=

′′ ′ ′ ′′

Γ Γ − Γ − Γ Γ +

′′

Γ + Γ + Γ +

 

 

=

′′ ′ ′ ′′

Γ Γ − Γ − Γ Γ +

(13)

### }

1

0 2 2 lim . p α ρ

α α α α

Γ = −

′′ ′

Γ Γ − Γ

(B.4)

By using the polygamma function (see Abramowitz and Stegun [33])

( )

### )

2 1 2 d , d

α α α

α α

α α α

′′ ′

Γ Γ − Γ

′ Γ

Ψ = =

Γ Γ

Equation (B.4) is transformed into

( )

1

0 1 2 1 lim . p

### α

− → = − Ψ (B.5)

The digamma function for real α>0, as

→ ∞ is

### ( )

2 4 6 8

1 1 1 1 1

log

2 12 120 252 O

α α

α α α α α

 

Ψ = − − + − +  

 

 

Its derivative is the polygamma function

( )1

### ( )

9

2 3 5 7

1 1 1 1 1 1

.

2 6 30 42 O

α

α α α α α α

 

Ψ = + + − + +  

 

 

And,

( )1

### ( )

8

2 4 6

1 1 1 1 1

1 .

2 6 30 42 O

α α

α α α α α

 

Ψ = + + − + +  

 

 

From which, lim ( )1

### ( )

1

α→∞αΨ α = . Finally, the limit in Equation (B.5) is

( )

### }

1

0 , 0

1 2

1

lim lim lim lim 1.

p α α p α

ρ ρ α α − →∞→∞ → →∞ →       =  = − = −   Ψ    

### Appendix C: Inverse Correlation and Bias Elimination

Consider a gamma distributed time series Xt with a large shape parameter from which xt are observations. We have shown that for very small negative p, xt and p

t

x are nearly perfectly correlated, albeit negatively, so we can express p

t

x in the original units of xt, by means of the linear regression of xt on p t

x as follows:

0 1 ,

p

t t t

x =c +c x +ε where εt is an error term.

As p approached zero from the left, near perfect correlation between xt and p t

x ensures that the error term becomes negligible, and a near perfect estimate is obtained from

0 1 .

p

t t

x′ = +c c x (C.1)

Now, suppose that

### ( )

1 , 2, , ,

t t t

x = f x + t=  n

(14)

biased.

To remove this bias, we power transform xˆt to obtain ˆp t

x . Then, we use Equation (C.1) to convert ˆp t x back to the original units of xˆt. Hence

0 1

ˆ ˆ ˆ ˆ ,p

t t

x′ = +c c x

where cˆ0 and cˆ1 are least squares estimates obtained from the regression of xˆt on ˆp t

x , and the error approaches 0.

### )

1 1 1

ˆt ˆ ˆ p ˆ ˆtp ˆ ˆtp ˆp .

x′ = −x c x +c x = +x c xx

Denoting sample standard deviation by s and correlation coefficient by

ˆ ˆ

ˆ p xx

ρ ,

ˆ

### )

ˆ ˆ ˆ ˆ

ˆ p p ˆ ˆ

p p

t xx x x t

x′ = +x ρ s s xx

Both estimates xˆt and xˆt′ contain errors. These errors contain two components. One component is purely random and one component is bias. Combining the estimates dynamically cancels the bias components, leaving only the purely random components. See Appendix D for the proof of how this can occur. The combining weights discussed in Appendix D are theoretical, expressed in terms of errors that are unknown and un- observable, so we must rely on the approximation as follows. The combined estimate xˆc t, is obtained from

### )

,

ˆc t ˆt 1 ˆt, xx + −ω x

where −∞ < < ∞

### ω

, and the value of

### ω

is chosen so as to minimize the mse

2

, 2

ˆ . 1 n

t c t t

x x n

=

### ∑

Consider the error in xˆc t, , eˆt=xˆc t,xt. Then 2

### }

2

2ˆ 2 ˆ 1 ˆ

n n

t t t t

t=e = t= ωx + −ω x′−x

### ∑

. Differentiating with

respect to

, 2

### )

2ˆ 2 ˆ ˆ ˆ ˆ ˆ

d n t d n 2 t t t t t t .

t=e ω= t= ω xx′ + −xx xx

### ∑

Setting the derivative to zero and solving for

,

=

nt=2

xtxˆ′t

xˆtxˆt

tn=2

xˆtxˆt

### )

2

. This optimal ω yields the minimum mse, because

2

2 2 2

2ˆ 2 ˆ ˆ

d

tn=et d

=2

### ∑

tn= xtxt′ =0 iff xˆt′ = ∀xˆt t, in which case ω=1, and

2 2 2

d

nt=et d

### ω

>0 otherwise. The steps for obtaining the combined antithetic fitted values are outlined as follows:

Step 1: Estimate the model parameters and fitted values xˆt = f x

### ( )

t1 ,t=2, 3,, .n Step 2: Set p= −0.001.

Step 3: Calculate

=

tn=2

xtxˆt

xˆtxˆt

nt=2

xˆtxˆt

### )

2.

Step 4: Calculate ,

ˆ

### )

ˆ ˆ ˆ ˆ

ˆ ˆ 1 p p ˆ ˆ , 2, 3, , .

p p

c t t xx x x t

xx + −ω xs s xx t=  n

Likewise, the unbiased combined estimate of a future value at time

is obtained from

### ) ( )

,

ˆc n ˆn 1 ˆn . x τ =ωx τ + −ω x′ τ

### Appendix D: Antithetic Fitted Error Variance Reduction

Consider a gamma distributed time series Xt with a large shape parameter. Next, consider a minimum mean square error fitted value obtained from a stationary first-autoregressive process XtX

Xt1−µX

### )

+t, given by Xˆt =µˆX +φˆ

Xt1−µˆX

where µX =

### [ ]

Xt and µˆX and ˆφ are least-squares estimates of µ

and φ, respectively, such that

2

1 1

2 2

ˆ n n ,

t X t X t X

t t

X X X

φ µ µ µ

= =

=

− −

### )

2

1 1 1

2 2

ˆ n n ,

X t X t X t X t X

t t

X X X

φ µ φ − µ µ − µ − µ

= =

(15)

2

2

1 1 1

2 2 2

ˆ n n n ,

t X t t X t X

t t t

X X X

φ φ µ µ µ

= = =

 

= − + −

 

2

1 1

2 2

ˆ n n .

t t X t X

t t

X X

φ φ − µ − µ

= =

= +

 −

### ∑

Therefore, as n→ ∞, and since Xt is stationary so that Var

Xt =Var

Xt1

### )

, and since the errors are serially correlated so that Cov

t,Xt1

≠0,

1

ˆ Cov , Var

t Xt Xt

### φ φ

= +  − (D.1)

(see also Fuller [35], p. 404). Consider ˆXt as an estimate of Xt. From (D.1), and given that the time series is stationary, then as n→ ∞ and µˆX →µX ,

1

1

1

### [ ]

ˆ Cov , Var ,

t X t X t t t t X t t

X =

+

X− −

+  XX X− −

= X +u

where

Cov , 1 Var

1

t t t t t X

u =  X X X

### µ

(D.2) due only to errors resulting from serial correlation. Therefore,

2

### ( )

1

Var ut = Cov t,Xt− Var Xt Var Xt . (D.3)

Next, consider another fitted value ˆXt′, obtained from the linear projection of the asymptotically antithetic series p

t

X on Xt, without the introduction of any new error,

1 2

### }

0, 0

ˆ lim Var Var p ˆp p .

t t t t t t

p

X X X X X X

σ

### ρ

 

′ =  + −  

Substituting for ˆXt

## }

### [ ]

1 2 1 0, 0 ˆ

ˆ lim Var Var ˆ ˆ

,

p

p p

t t t t X t X t

p

t t

X X X X X X

X u

σ − ρ µ φ − µ  

′ = + + − −   ′ = +    (D.4)

where ρ is the correlation between Xt and p t X , and

1 2

## }

1 0, 0

ˆ

ˆ ˆ

lim Var Var p p p

t t t X t X t

p

u X X X X

σ −ρ µ φ − µ  

′ = + − −   (D.5)

is the antithetic error due to the serial correlation, but corresponding to ˆXt′.

The expansion of

µˆX +φˆ

Xt1−µˆX

### ) (

p = µˆX +φˆXt1−φµˆˆX

### )

p will

contain the constant

µˆX−φµˆˆX

### )

p, the product of p and some function of Xt1 and

ˆ 1

p t X

φ as follows:

ˆ ˆ 1 ˆˆ

ˆ ˆˆ

1

ˆ 1

ˆ ˆˆ

ˆ 1

### )

p p p p

p p

X Xt X X X pf Xt Xt X X Xt

µ +φ − −φµ = µ −φµ + − +φ − → µ −φµ + φ − as p→0. Substituting

into Equation (D.5),

1 2

## }

1 0, 0

ˆ ˆ

ˆ ˆ

lim Var Var p p p p p .

t t t X X t t

p

u X X X X

σ −ρ µ φµ φ −  

′ = − + −   (D.6)

Now

2

2

### )

Var ωut + −1 ω ut′ =ω Var ut + −1 ω Var ut′ +2ω 1−ω Cov u ut, t′ .

Substituting from Equation (D.2) and (D.6) and since µˆX and ˆφ are fixed for the data and model,

### )

2 2 2 0, 0 2 1 1 1 2 1 1

Var 1 = lim Var 1 Var Var

ˆ Var 2 1 Cov , Var

ˆ

Var Var Cov , .

p

t t t t t

p

p p

t t t t

p p p

t t t t

u u u X X

X X X

X X X X

σ

ω ω ω ω ρ

φ ω ω ρ

(16)

Substituting for Cov

1, p1

t t

X X and Var

Xt1

### }

2

2 2 2

0, 0

1 2 1 2

1

ˆ

Var 1 lim Var 1 Var

ˆ

2 1 Cov , Var Var Var Var Var .

p

t t t t

p

p p p

t t t t t t t

u u u X

X X X X X X

σ

ω ω ω ω ρ φ

ω ω ρ φ ρ

→ →

+ − = + −

+ − 

Substituting from (D.3) and factoring out Var

Xt

### ( )

2 2

2 2 2

1 0, 0

2 1

ˆ

Var 1 lim Cov , Var 1

ˆ

2 1 Cov , Var Var

p

t t t t t

p

p

t t t t

u u X X

X X X

σ

ω ω ω ω ρ φ

ω ω ρ φ

− −

→ →

 ′

+ − =  + −

+ −

and since ρ2→1 (see Appendix B), and ˆφp and φˆ2p →1 as p→0−, then

2

### ( )

1

Var ωut+ −1 ω ut = ωCov t,Xt− Var Xt + −1 ω Var Xt from which we see that there are many ways in which the combined error variance can be less than the original error variance in Equation (D.3). In

particular when ω

1 Cov

t,Xt 1

Var

Xt

### }

1 − −

Figure

+3

References

Related documents

Based on our installed, reliable and quality IT infrastructure and our cost effective IT service coverage across central Gauteng, Mpumalanga and North-West, Diesel Power IT is in

Striving towards continuous quality improvement in asset management strategy in EMSD, the GIS-based Remote Monitoring System was developed internally and

These changes depended on environmental and anthropogenic factors.With the annual use of chemicals on cabbage plants, there was an increase in the abundance of the diamondback moth

The Board does not believe that the Company's compensation program encourages excessive or inappropriate risk taking because (i) the base fees provide a steady

Autonomous Campus Autonomous Campuses UiTM Main University (Universiti Induk) State Universities (Universiti Negeri) Pusat Perancangan.. Strategik UiTM… Mengubah Destini

36 Fixed Lease - Commercial Cost The portion of lease costs paid to a commercial source that are based on the time the aircraft is available to the agency, but not those costs

Review of Business Plan of Domibite Restaurant, Lokoja, Kogi State, Nigeria.. Madu Ikemefuna [a],* ; Maji