http://dx.doi.org/10.4236/jamp.2015.312197

**How to cite this paper: Ngnepieba, P. and Ridley, D. (2015) General Theory of Antithetic Time Series. Journal of Applied **
*Mathematics and Physics, ***3**, 1726-1741. http://dx.doi.org/10.4236/jamp.2015.312197

**General Theory of Antithetic Time Series **

**Pierre Ngnepieba**

**1**

_{, Dennis Ridley}

_{, Dennis Ridley}

**2,3***

1_{Department of Mathematics, Florida A&M University, Tallahassee, USA }
2_{SBI, Florida A&M University, Tallahassee, USA }

3_{Department of Scientific Computing, Florida State University, Tallahassee, USA }

Received 6 November 2015; accepted 27 December 2015; published 30 December 2015

Copyright © 2015 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY).

http://creativecommons.org/licenses/by/4.0/

**Abstract **

**A generalized antithetic time series theory for exponentially derived antithetic random variables **
**is developed. The correlation function between a generalized gamma distributed random variable **
**and its ****p****th exponent is derived. We prove that the correlation approaches minus one as the **
**expo-nent approaches zero from the left and the shape parameter approaches infinity. **

**Keywords **

**Antithetic Time Series Theory, Antithetic Random Variables, Bias Reduction, Gamma Distribution, **
**Inverse Correlation, Serial Correlation **

**1. Introduction **

Serially correlated random variables arise in ways that both benefit and bias mathematical models for science, engineering and economics. In one widespread example, a mathematical formula is used to create uniformly distributed pseudo random numbers for use in Monte Carlo simulation. The numbers are serially correlated be-cause they are generated by a formula. The benefit is that the same pseudo random numbers can be recreated at will, and two or more simulation experiments can be compared without regard to the pseudo random numbers. The correlation is designed to be very small so as not to bias the results of a simulation experiment. Still, some bias is unavoidable when using serially correlated numbers (Ferrenberg, Lanau, and Wong [1]).

Another wide spread example is a regression model in which the dependent variable is serially correlated. The result is biased model parameter estimates because the independence assumption of the Gauss-Markov theorem is violated (see Griliches [2], Nerlove [3], Koyck [4], Klein [5]). Similarly, the independence assumption of Fuller and Hasza [6] and Dufour [7] would not apply. The absence of any relevant information from a model will express itself in the patterns of the error term. If complete avoidance of bias requires normally distributed

data, then the absence of normality is like missing information. Bias may also be due to missing data points
(Chandan and Jones [8], Li, Nychka and Amman [9]). Assume that a perfect model is postulated for a given
application in which the population to which the data belong is known exactly. The model must be fitted to a
sample of data, not the population. However, once the sample is taken, the distribution is automatically
truncated and distorted, and the fitted model is biased. Regardless of the method of fitting, however small,
sampling bias is unavoidable. One approach aimed at improving model performance is to combine the results
from different models. For an extensive discussion and review of traditional combining see Bunn [10], Diebold
[11], Clemen [12], Makridakis *et al.*[13], and Winkler [14].

Economics researchers have commented on serial correlation bias. Hendry and Mizon [15] and Hendry [16] considered common factor analysis (Mizon [17]) and suggested that serial correlation is a feature for re- presenting dynamic relationships in economic models. This in turn implies that economics allows for serial correlation (see Pindyck and Rubinfield [18]). Time domain methods for detecting the nature and presence of serial correlation were considered by Durbin and Watson [19] and Durbin [20]. Spectral methods were con- sidered by Hendry [16], Osborn [21] and Espasa [22]. Even if serial correlation can be a tool for studying the nature of economics, it is detrimental to long range forecasting models. Whatever the source of bias may be, the only possibility for long range forecasting is to completely eliminate the bias.

**1.1. Background **

Inversely correlated random numbers were suggested by Hammersley and Morton [23] for use in Monte Carlo computer simulation experiments. In that application, a single computer simulation is replaced by two simula- tions. One simulation uses uniformly distributed

### ( )

0,1 random numbers in*r*. The other simulation uses

### 1

### −

*r*

.
The expectation is that the average of the results of these two simulations has a smaller variance than for either
one. In practice, the variance sometimes decreases, but sometimes it increases. See also Kleijnen [24].
The theory of combining antithetic lognormally distributed random variables that contain negatively cor- related components was introduced by Ridley [25]. The Ridley [25] antithetic time series theorem states that “if

0, 1, 2, 3,
*t*

*X* > *t*= is a discrete realization of a lognormal stochastic process, such that ln*Xt* ~*N*

### (

### µ σ

, 2### )

, then if the correlation between*X*and

_{t}*p*

*t*

*X* is ρ, then

0 , 0

lim 1

*p*→−σ→+ρ

= − .” Antithetic variables can be com-

bined so as to eliminate bias in fitted values associated with any autoregressive time series model (see the Ridley
[25] antithetic fitted function theorem, and antithetic fitted error variance function theorem). Similarly, antithe-
tic forecasts obtained from a time series model can be combined so as to eliminate bias in the forecast error.
Ridley [26] applied combined antithetic forecasting to a wide range of data distributions. Ridley [27] demon-
strated the methodology for optimizing weights for combining antithetic forecasts. See also Ridley and Ngne-
pieba [28] and Ridley, Ngnepieba, and Duke [29]. The antithetic variables proof in Ridley [25] was for the
special case of *X _{t}* lognormally distributed.

The implication for using a biased mathematical model to investigate economic, engineering and scientific phenomena is that estimates obtained from the model are biased. Estimates of future values extrapolated from the model are also biased. As the forecast horizon increases, the bias accumulates and the extrapolations diverge from the actual values. This is most pronounced in the case of investigations into global warming phenomena. There, the horizon is by definition very far into the future. The smallest bias will accumulate, so much so that conclusions may be as much an artifact of the mathematical model as they are about climate dynamics. Com- bining antithetic extrapolations can dynamically remove the bias in the extrapolated values.

**1.2. Proposed Research **

The antithetic gamma variables discussed in this research are defined as follows.

**Definition 1. Two random variables are antithetic if their correlation is negative. A bivariate collection of ***random variables is asymptotically antithetic if its limiting correlation approaches minus one asymptotically *
(*see antithetic gamma variables theorem below*)*. *

**Definition 2. **

### {

*X*

### ( )

### ξ

,*t*

### }

*is an ensemble of random variables*,

*where*ξ

*belongs to a sample space and t*

*belongs to an index set representing time*,

*such that X*

_{t}is a discrete realization of a gamma stationary*stochastic process from the ensemble*,

*X*~Γ

_{t}### (

α β,### )

,*and*

*X tt*, =1, 2, 3,

*are serially correlated.*

explain many other distributions. That is, a wide range of distributions can be represented by the gamma
distribution. We will explore these possibilities by examining the correlation between *X* and *p*

*X* when *X* is
gamma distributed. Of particular interest is the correlation between *X* and *p*

*X* as *p*→0−. A graph of the
correlation between *X* and *p*

*X* as *p*→0+ and *p*→0− is shown in **Figure 1**. We begin by reviewing the
obvious results for the case when *p* is positive. The correlation is positive when *p* is positive and exactly one
when *p* is one. This is expected. As *p* moves away from one, the correlation decreases. As *p* approaches zero
from the right, the correlation falls, albeit very slowly. This is also expected. In the case when p is negative, the
correlation behaves quite differently. The result is entirely counterintuitive. As expected, the correlation is
negative. But, unlike when *p* is positive, as *p* approaches zero from the left, the absolute value of the correlation
increases. Furthermore, the actual correlation approaches minus one, not zero.

One purpose of this paper is to derive an analytical function for the correlation between *X* and *p*

*X* when *X* is
gamma distributed. A second purpose is to explore by extensive computation, the behavior of the correlation as
*p* approaches zero from the left. The trivial case of *p* equal zero where the correlation is zero, is of no interest.
We are interested in *p* inside a delta neighborhood of zero, not zero. Finally, we prove that the limiting value of
the correlation is minus one.

The paper is organized as follows. In Section 2 we review the gamma distribution. In Section 3 we derive the analytic function for the correlation. In Section 4 we prove its limiting value. In Section 5 we use MATLAB [30] to compute correlations for a wide range of values generated from the gamma distribution. In Section 6 we outline the method for using antithetic variables to dynamically remove bias from the fitted and forecast values obtained from a time series model. Examples include computer simulated data. Section 7 contains conclusions and suggestions for further research.

**2. The Gamma Distribution **

The gamma distribution is very important for technical reasons, since it is the parent of the exponential distribution and can explain many other distributions. Its probability distribution function (pdf) (see Hogg and Ledolter [31]) is:

### (

; ,### )

1### ( )

1exp , if 00, if 0,

*X*

*X* *X*

*f X*

*X*

α α

α β β α β

−

_{−} _{≥}

= Γ

_{<}

(1)

where α>0 is the shape parameter and β>0 is the scale parameter. The gamma function is defined as

### ( )

10 e d .

*X*

*X*α *X*

α ∞ − −

[image:3.595.177.452.539.711.2]Γ =

_{∫}

(2)
A graph of the gamma probability density function for β=0.6 and various values of α =1.1, 3, 5, 9,11 is
shown in **Figure 2**.

**Figure 2.** Exploring the effect of varying parameter values in the pdf of the gamma

distribution.

**3. Correlation between **

**X**

**X**

** and **

**X**

**X**

**p**_{ }

_{ }

Let *Xt*>0, *t*=1, 2, 3,, be a discrete realization of a generalized gamma stochastic process. For *p*∈,
from Appendix A, the *pth* moment of the gamma distribution is given by

### (

### )

### ( )

.*p* *p*

*t*

*p*

*X*

### β

### α

### α

Γ +

=

_{Γ}

(3)

Therefore, since Γ

### (

α+ = Γ1### )

α α### ( )

, the mean is### [ ]

### (

_{( )}

1### )

,*t*

*X*

### β

### α

### αβ

### α

Γ +

= =

Γ

the second moment is

### (

### )

### ( )

### (

### )

2 2 2 2

1 ,
*t*

*X*

### β

### α

### β α α

### α

Γ +

= = +

_{Γ}

and the variance is

### [ ]

### (

### )

22 2 2

.

*t* *t*

*X* *X*

σ =_{}_{} _{}− _{} =αβ (4)

Let ρ be the correlation between *X _{t}* and

*p*

*t*

*X* . Then

### (

### )

### ( )

### ( )

### {

### }

### [ ]

### ( )

### ( )

### {

### }

1

1 1

2 2

Cov ,

,

Var Var Var Var

*p* *p* *p*

*t* *t* *t* *t* *t*

*p* *p*

*t* *t* *t* *t*

*X* *X* *X* *X* *X*

*X* *X* *X* *X*

ρ

+

−

= = (5)

and

### ( )

### (

### )

22

Var *X _{t}p* =

_{}

*X*

_{t}p_{}−

_{}

*X*

_{t}p_{}.

Using Equation (3)

### ( )

### (

_{( )}

### )

### (

### (

### )

### )

### ( )

### (

### )

2

2 2

2 2

Var *tp* *p* *p* .

*p*
*p*

*X* β α β α

α _{α}

Γ +

Γ +

= −

Γ _{Γ} (6)

### (

### )

### ( )

### (

### ( )

### )

### (

### )

### ( )

### (

### ( )

### )

### ( )

### (

### )

### ( )

### (

### ( )

### )

### (

### )

### ( )

### (

### ( )

### )

### ( )

### (

### )

### ( )

### (

### ( )

### )

### ( ) (

### {

### ) ( )

### (

### (

### )

### )

### }

### (

### )

### (

### )

### ( ) (

### )

### (

### (

### )

### )

### {

### }

1 1 1 12 _{2} 2 _{2}

1 1

2 2 _{2}

2

1 _{1} 1

1 _{2}

2 2 2 2 2

1 1
2 2
1
1
.
2
2
*p* *p*
*p* *p*

*p* *p* *p* *p*

*p* *p* *p* *p*

*p* *p*

*p* *p*

*p* *p*

*p* *p*

α α α α

β αβ α

α α α α

ρ

α α α α

β β α β α

α α α α

α α

α

α α α α α

α _{α} _{α} _{α} α α α α

α

+ Γ + + _{−} + Γ + Γ + + _{−} Γ +

Γ Γ Γ Γ

= =

_{Γ} _{+} _{} _{Γ} _{+} _{} _{Γ} _{+} _{}_{Γ} _{+} _{}

_{−} _{−}

_{Γ} _{} _{Γ} _{} _{Γ} _{} _{Γ} _{}

Γ + + Γ +

−

Γ Γ Γ + + − Γ +

= =

Γ Γ + − Γ +

Γ + Γ − Γ +

Γ

(7)

Since Γ

### (

α+ + =*p*1

### ) (

α+*p*

### ) (

Γ α+*p*

### )

, Equation (7) becomes,### (

### )

### ( ) (

### )

### (

### (

### )

### )

### {

### }

11
2 _{2}
2
.
2
*p* *p*
*p* *p*

### α

### ρ

### α

### α

### α

### α

Γ +

=

Γ Γ + − Γ +

(8)

From Equation (4)

1

2 σ

α β

= , and the correlation can be expressed in terms of

### σ

as### (

### )

### ( ) (

### )

### (

### (

### )

### )

### {

### }

12 _{2}
,
2
*p* *p*
*p* *p*

### β α

### ρ

### σ

### α

### α

### α

Γ +

=

Γ Γ + − Γ +

or

2

1 2 2

2 2 2

.
2
*p* *p*
*p* *p*
σ
β
β
ρ

σ σ σ

σ

β β β

_{ }

Γ _{ } +
_{ }

=

_{} _{ } _{} _{} _{} _{}_{}

_{Γ}_{}_{ } _{ }_{Γ} _{ } _{+} _{}_{− Γ}_{} _{}_{ } _{+} _{}_{}
_{ } _{ } _{} _{ } _{}

(9)

The gamma function Γ

### ( )

α results in a complex number when the argument is negative. This is avoided if2

*p*<α . In any case, since we are only interested in *p* approaching zero from the left, this condition will always

be satisfied when α>0.

**4. Antithetic Gamma Variables Theorem **

**Theorem 1. If ***Xt*>0, *t*=1, 2, 3,,* is a discrete realization of a generalized gamma stochastic process with *
*shape parameter *

### α

,*then if*ρ

*is the correlation between*

*X*

_{t}and*p*

*t*
*X* ,* then *

### (

### )

### ( ) (

### )

### (

### (

### )

### )

### {

### }

11

0 , 0 ,

2 2 2

lim lim 1.

2
*p* *p*
*p* *p*
*p* *p*
α α

### α

### ρ

### α

### α

### α

### α

− −

→ →∞ → →∞

Γ +

= = −

Γ Γ + − Γ +

See proof in Appendix B.

**5. Correlation versus **

**p**

**p**

The effect of *p* on the correlation is demonstrated by calculating the correlation coefficient from Equation (8) for
various values of

### α

and β. The correlation coefficients are listed in**Table 1**and plotted in

**Figure 3**and

**Figure 4**. From **Figure 3** and **Figure 4**, for all values of

### α

, the correlation coefficient gets closer to### −

### 1

as 0 [image:5.595.80.545.75.481.2]**Figure 3.**
0

lim

*p*→−ρ using Equation (8) for α=5,10,15, 20 and 25; *p*= −0.5 to −0.005.

**Figure 4.**
0 ,

lim

*p*→−α→∞ρ

using Equation (8).

that are more symmetrical. Also, as

### α

increases the standard deviation increases. This is indicative of greater spread about both sides of the mode of*X*. Equation (9) expresses the correlation in terms of

### σ

. In practice the value of### α

will be that for the actual data under study. It cannot be modified. Still, one might say that it appears that the effect of p on reversing the correlation is greatest for symmetrical distributions.To validate Equation (8), the MATLAB [30] random number generator GAMRND (

### α

, β ,*n*) is used to generate

*n*=10000 random numbers

*x*, from the gamma distribution in Equation (1) with parameters

_{t}25

α= , β=0.5. The correlation is estimated from the sample correlation coefficient (ρˆ ). One application of the correlation reversal is to remove bias in values extrapolated from a time series model (see Appendix C). The gamma distribution is immediately applicable when α≥25, such that ρ is approximately minus one. For

25

[image:6.595.141.489.331.573.2]### (

### )

### (

### )

### (

### )

### (

### )

1 2 2 21 1 1

ˆ

*n* *n* *n*

*p* *p* *p* *p*

*t* *t* *t* *t*

*t* *t* *t*

*x* *x* *x* *x* *x* *x* *x* *x*

ρ

= = =

= − − _{} − − _{}

### ∑

### ∑

### ∑

, where,1
*n*

*t*
*t*

*x* *x* *n*

=

=

### ∑

,1
*n*

*p* *p*

*t*
*t*

*x* *x* *n*

=

=

### ∑

,*t*=1, 2,,

*n*.

The results are shown in **Table 2**. The coefficients are almost identical to the theoretical values obtained from
Equation (8) and listed in **Table 1**. In practice, the data may include relatively few observations. To investigate
the small sample correlation coefficient, the correlation coefficient is calculated for *n*=100.

**6. Bias Reduction **

Consider an autoregressive time series *x _{t}* of

*n*discrete observations obtained from a gamma distribution with a large shape parameter to which a least squares model

*x*= Φ

_{t}*x*−

_{t}_{1}+

*,*

_{t}*t*=2,,

*n*is fitted. Let the fitted values be

ˆ_{t}

*x* . Next, consider the combined weighted average fitted values *x*ˆ_{c t}_{,} =ω*x*ˆ* _{t}*+ −

### (

1 ω### )

*x*ˆ

*′,− ∞ ≤ ≤ ∞ω . The para- meter*

_{t}### ω

is a combining weight. The fitted values*x*ˆ

*and*

_{t}*x*ˆ

*′ are antithetic in the sense that they contain compo- nents of error ˆ*

_{t}*and ˆ*

_{t}*′, respectively, that are biased and when weighted, ωˆ*

_{t}*and*

_{t}### (

1−ω### )

ˆ*′ are perfectly nega- tively correlated. The antithetic component*

_{t}*x*ˆ

*t*′ is estimated from ˆ

_{ˆ ˆ}

*p*

### (

ˆ_{ˆ}

*p*

### )

### (

ˆ ˆ### )

, 1, 2, 3, ,*p* *p*

*t* _{xx}*x* _{x}*t*

*x*′ = +*x* *r* *s* *s* *x* −*x* *t*= *n*,

**Table 1.** Behavior of ρ as *p*→0− using Equation (8) with various values of α.

0

*p*→ − α=5 α=10 α=15 α=10 α=25

0.5

*p*= − −0.8817 −0.9423 −0.9618 −0.9715 −0.9772

0.25

*p*= − −0.9204 −0.9605 −0.9737 −0.9803 −0.9843

0.1

*p*= − −0.9395 −0.9697 −0.9789 −0.9848 −0.9878

0.05

*p*= − −0.9452 −0.9724 −0.9816 −0.9862 −0.9888

0.025

*p*= − −0.9479 −0.9738 −0.9825 −0.9868 −0.9895

0.015

*p*= − −0.9490 −0.9743 −0.9828 −0.9871 −0.9897

0.005

*p*= − −0.9500 −0.9748 −0.9832 −0.9874 −0.9899

0.0025

*p*= − −0.9503 −0.9759 −0.9833 −0.9833 −0.9875

0.0013

*p*= − −0.9504 −0.9751 −0.9833 −0.9874 −0.9900

0.0005

*p*= − −0.9505 −0.9751 −0.9833 −0.9875 −0.9900

0.00025

*p*= − −0.9505 −0.9751 −0.9833 −0.9872 −0.9900

0.0001

*p*= − −0.9505 −0.9751 −0.9833 −0.9875 −0.9900

**Table 2.** Values of ρˆ for *p*→0−, α =25, β=0.5.

0

*p*→ − *n*=100 *n*=10000

0.5000

*p*= − −0.9785 −0.9772

0.4500

*p*= − −0.9799 −0.9802

0.3500

*p*= − −0.9826 −0.9816

0.3000

*p*= − −0.9839 −0.9830

0.2500

*p*= − −0.9851 −0.9842

0.2000

*p*= − −0.9863 −0.9855

0.1500

*p*= − −0.9874 −0.9867

0.1000

*p*= − −0.9888 −0.9879

0.0500

*p*= − −0.9905 −0.9890

0.0001

where the exponent of the power transformation is set to the small negative value *p*= −0.001, *r* denotes sample
correlation coefficient and *s* denotes sample standard deviation (see Appendix C for an outline of how inverse
correlation can be used to eliminate bias in *x*ˆ_{c t}_{,} ). The expectation is that if *x*ˆ* _{t}* are biased, then

*x*ˆ

_{c t}_{,}will exhibit diminishing bias as

*p*→0−. If

*x*ˆ

*are unbiased then ω=1 and the combined fitted values are just the original fitted values. The corresponding combined forecast values are*

_{t}*x*ˆ

_{c n}_{,}

### ( )

τ =ω*x*ˆ

_{n}### ( ) (

τ + −1 ω### ) ( )

*x*ˆ

*′ τ .*

_{n}A shift parameter λ similar to that discussed by Box and Cox [32] is used to facilitate the power trans-
formation and further improve the combined fitted mse. λ (determined by grid search) can be added to each
value of *x _{t}* to obtain

*z*= +

_{t}*x*λ prior to applying the power transformation and subtracted after conversion back to their original units, leaving the mean unchanged. While the data may be from stationary time series, they are of necessity a truncated sample. Any truncated data sample will fall short of the complete distributional properties of the population from which they are drawn, and therefore the property of stationary data. The

_{t}antithetic time series is rewritten and computed from

### (

_{ˆ}

### )

### (

### )

ˆˆ ˆ

ˆ

ˆ *p* *p* ˆ ˆ

*p* *p*

*t* _{zz}*z* _{z}*t*

*x*′ = +*x* ρ *s* *s* *z* −*z* , *t*=1, 2, 3,,*n*, where

0.001

*p*= − ,

ˆˆ

ˆ_{zz}p

ρ is the sample correlation between ˆ*z* and ˆ*p*

*z* , where *s*_{ˆ}* _{z}* and

ˆ*p*
*z*

*s* are sample standard
deviations in *z*ˆ*t* and ˆ

*p*
*t*

*z* respectively, and where λ and

### ω

are chosen so as to minimize the combined fitted mse for*x*ˆ

_{c t}_{,}=ω

*x*ˆ

*+ −*

_{t}### (

1 ω### )

*x*ˆ

*′. The antithetic forecast values are computed from*

_{t}### ( )

ˆ_{ˆˆ}

### (

ˆ_{ˆ}

### )

### (

### ( )

### )

ˆ *p* *p* ˆ ˆ

*p* *p*

*n* _{zz}*z* _{z}*n*

*x*′ τ = +*x* ρ *s* *s* *z* τ −*z* , τ =1, 2, 3,,*N*.

Of the terms *p*, λ, and

### ω

, only*p*is unique to antithetic time series analysis, and it is not a fitted parameter. When implemented,

*p*is actually a constant set to −0.001, an approximation of 0−. Also, since ˆρ

*= −1, the transformation involving*

_{zz}p*p*is linear, and does not imply that the original model should have been non-linear. Like the use of λ here, it is common practice to apply various transformations such as logarithm, square root and Box and Cox [32] that add no new information, but make the data better conform to the assumptions of a postulated model. If there were no bias, or if antithetic combining did not reduce bias, then

### ω

would simply be equal to one and the original postulated model only would apply.**Computer Simulation **

To illustrate, consider a model fitted to computer simulated data based on stationary autoregressive processes,
containing 1060 observations generated from *xt* =*yt*+_{250}+*et*, *t*=1, 2, 3,,1060, where to avoid initialization pro-
blems, the first 250 values are dropped from *yt*=0.8*yt*−_{1}+*et*, *t*=1, 2, 3,,1310, *y*0 =0 and *et* ~Γ

### (

α β,### )

,0.6

β= , α =5,10,15, 20, 25 are obtained from MATLAB [30]. From the 1060 values, different models are fitted from the first 50, 51, , 60 values. Each model is used to forecast 1000 one-step-ahead forecast values corresponding to periods 51 - 1050, 52 - 1051, , 61 - 1060. This simple first order autoregressive model is chosen for its ease of understanding and transparency. It is perfect for the population from which the data are sampled. The sample sizes are typical of what can be expected in practice, and the outcomes from model fitting are subject to sampling bias.

The results are shown in **Table 3**. As

### α

increases, the fitted mse’s increase, indicative as expected, of the increase in the variance in the data. The combined fitted mse’s are all lower than the original fitted mse’s. The average gain is a reduction in fitted mse of 5.5%. This demonstrates that for a wide range of gamma distributions, combining antithetic fitted values can reduce the component of error that is due to systematic bias, leaving only random error. The fitted mse and 1000 period forecast horizon mse sensitivities to forecast origin (*n*) are shown in

**Table 4**. As

*n*increases from 51 to 60, the combined fitted mse’s are lower than the original fitted mse’s. The average gain is a reduction of 11.1%. The average gain in the combined forecast mse over the original forecast mse is a reduction of 6.9%. The forecast mse sensitivities to forecast horizon (

*N*) are shown in

**Table 5**. As *N* increases from 100 to 700, the combined forecast mse's are lower than the original forecast mse’s.
The average gain is a reduction of 6.1%.

**7. Conclusion **

**Table 3.** Fitted mean square error (mse) for gamma distributed autoregressive processes of length *n*=50, *p*= −0.001,

0.6

β= and various values of α.

α λ ω Fitted mse

Original Combined Reduction %

5 370 −46 1.377 1.209 12.2

10 520 −39 4.753 4.394 7.6

15 481 −17 6.892 6.732 2.3

20 575 −11 6.922 6.824 1.4

25 529 −43 9.686 9.304 3.9

[image:9.595.87.539.262.463.2]Average 5.5

**Table 4. **Fitted mean square error (mse) and one thousand period forecast mean square error (mse) for gamma distributed

autoregressive processes of length *n*, *p*= −0.001, β=0.6, α=5.

Forecast origin *n* λ

ω Fitted mse Forecast mse

Original Combined Original Combined

50 370 −46 1.377 1.209 6.158 5.760

51 411 −46 1.351 1.185 6.230 5.407

52 252 −46 1.354 1.202 6.493 5.641

53 446 −39 1.397 1.266 6.071 5.943

54 310 −55 1.371 1.160 6.070 5.611

55 261 −37 1.356 1.254 5.941 5.549

56 465 −41 1.345 1.189 6.108 5.720

57 400 −39 1.331 1.218 6.234 5.592

58 507 −36 1.312 1.191 6.142 6.224

59 405 −32 1.290 1.175 6.044 5.687

60 354 −62 1.434 1.207 5.600 5.300

Average 1.356 1.205 6.099 5.676

Combined reduction % 11.1 6.9

**Table 5.** Forecast mean square error (mse) for gamma distributed autoregressive processes of length *n*=50, *p*= −0.001,

0.6

β= , α =5.

Forecast horizon* N* Forecast mse

Original Combined Reduction%

100 5.973 5.581 6.6

150 6.529 6.059 7.2

200 5.426 5.220 3.8

250 6.559 6.152 6.2

300 6.455 6.181 4.2

350 6.390 6.051 5.3

400 5.982 5.630 5.9

450 5.906 5.554 6.0

500 5.972 5.607 6.1

550 6.114 5.699 6.8

600 6.063 5.616 7.4

650 5.818 5.401 7.2

700 5.632 5.243 6.9

antithetic time series analysis can be generalized to all data distributions that are likely to occur in practice. The
gamma distribution is unimodal. A suggestion for future research is to investigate the correlation between a
random variable and its *p*th power when its distribution is multimodal. Another suggestion is to compare the
effectiveness of the Hammersley and Morton [23] antithetic random numbers with antithetic random numbers
constructed from the method described in this paper. Combining antithetic extrapolations can dynamically
reduce bias due to model misspecifications such as serial correlation, non-normality or truncation of the dis-
tribution due to data sampling. Removing bias will eliminate the divergence between the extrapolated and actual
values. In the particular case of climate models, removing bias can reveal the true long range climate dynamics.
This will be most useful in models designed to investigate the phenomenon of global warming. Beyond the
examples discussed here, antithetic combining has broad implications for mathematical statistics, statistical
process control, engineering and scientific modeling.

**Acknowledgements **

The authors would like to thank Dennis Duke for probing questions and good discussions.

**References **

[1] Ferrenberg, A.M., Lanau, D.P. and Wong, Y.J. (1992) Monte Carlo Simulations: Hidden Errors from Good Random
Number Generators. *Physical Review Letters*, **69**, 3382-3384. http://dx.doi.org/10.1103/PhysRevLett.69.3382

[2] Griliches, Z. (1961) A Note on Serial Correlation Bias in Estimates of Distributed Lags. *Econometrica*, **29**, 65-73.
http://dx.doi.org/10.2307/1907688

[3] Nerlove, M. (1958) Distributed Lags and Demand Analysis for Agricultural and Other Commodities. U.S.D.A Agri-cultural Handbook No. 141, Washington.

[4] Koyck, L.M. (1954) Distributed Lags and Investment Analysis. North-Holland Publishing Co., Amsterdam.

[5] Klein, L.R. (1958) The Estimation of Distributed Lags. *Econometrica*, **26**, 553-565. http://dx.doi.org/10.2307/1907516
[6] Fuller, W.A. and Hasza, D.P. (1981) Properties of Predictors from Autoregressive Time Series. *Journal of the *

*Ameri-can Statistical Association*, **76**, 155-161. http://dx.doi.org/10.1080/01621459.1981.10477622

[7] Dufour, J. (1985) Unbiasedness of Predictions from Estimated Vector Autoregressions. *Econometric Theory*, **1**, 381-
402. http://dx.doi.org/10.1017/S0266466600011270

[8] Chandan, S. and Jones, P. (2005) Asymptotic Bias in the Linear Mixed Effects Model under Non-Ignorable Missing
Data Mechanisms. *Journal of the Royal Statistical Society*: *Series B*, **67**, 167-182.

http://dx.doi.org/10.1111/j.1467-9868.2005.00494.x

[9] Li, B., Nychka, D.W. and Ammann, C.M. (2010) The Value of Multiproxy Reconstruction of Past Climate. *Journal of *
*the American Statistical Association*, **105**, 883-911. http://dx.doi.org/10.1198/jasa.2010.ap09379

[10] Bunn, D.W. (1979) The Synthesis of Predictive Models in Marketing Research. *Journal of Marketing Research*, **16**,
280-283. http://dx.doi.org/10.2307/3150692

[11] Diebold, F.X. (1989) Forecast Combination and Encompassing: Reconciling Two Divergent Literatures.* International *
*Journal of Forecasting*, **5**, 589-592. http://dx.doi.org/10.1016/0169-2070(89)90014-9

[12] Clemen, R.T. (1989) Combining Forecasts: A Review and Annotated Bibliography.* International Journal of *
*Forecast-ing*, **5**, 559-583. http://dx.doi.org/10.1016/0169-2070(89)90012-5

[13] Makridakis, S., Anderson, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., Newton, J., Parzen, E. and
Wink-ler, R. (1982) The Accuracy of Extrapolation (Times Series) Methods: Results of a Forecasting Competition.* Journal *
*of Forecasting*, **1**, 111-153. http://dx.doi.org/10.1002/for.3980010202

[14] Winkler, R.L. (1989) Combining Forecasts: A Philosophical Basis and Some Current Issues. *International Journal of *
*Forecasting*, **5**, 605-609. http://dx.doi.org/10.1016/0169-2070(89)90018-6

[15] Hendry, D.F. and Mizon, G.E. (1978) Serial Correlation as a Convenient Simplification, Not a Nuisance: A
Commen-tary on a Study of the Demand for Money by the Bank of England.* The Economic Journal*, **88**, 549-563.

http://dx.doi.org/10.2307/2232053

[16] Hendry, D.F. (1976) The Structure of Simultaneous Equations Estimators.* Journal of Econometrics*, **4**, 551-588.
http://dx.doi.org/10.1016/0304-4076(76)90017-8

[17] Mizon, G.E. (1977)Model Selection Procedures.In: Artis, M.J. and Nobay, A.D., Eds.,* Studies in Modern Economic *
*Analysis*, Basil Blackwell, Oxford.

[19] Durbin, J. and Watson, G.S. (1950) Testing for Serial Correlation in Least Squares Regression: I.* Biometrika*, **37**,
409-428.

[20] Durbin, J. (1970) Testing for Serial Correlation in Least-Squares Regression When Some of the Regressors Are
Lagged Dependent Variables.* Econometrica*, **38**, 410-421. http://dx.doi.org/10.2307/1909547

[21] Osborn, D.R. (1976) Maximum Likelihood Estimation of Moving Average Processes.* Journal of Economic and Social *
*Measurement*, **5**, 75-87.

[22] Espasa, D. (1977) The Spectral Maximum Likelihood Estimation of Econometric Models with Stationary Errors. 3, Applied Statistics and Economics Series. Vanderhoeck and Ruprecht, Gottingen.

[23] Hammersley, J.M. and Morton, K.W. (1956) A New Monte Carlo Technique: Antithetic Variates.* Mathematical *
*Pro-ceedings of the Cambridge Philosophical Society*, **52**, 449-475. http://dx.doi.org/10.1017/S0305004100031455
[24] Kleijnen, J.P.C. (1975) Antithetic Variates, Common Random Numbers and Optimal Computer Time Allocation in

Simulations.* Management Science*, **21**, 1176-1185. http://dx.doi.org/10.1287/mnsc.21.10.1176

[25] Ridley, A.D. (1999) Optimal Antithetic Weights for Lognormal Time Series Forecasting.* Computers & Operations *
*Research*, **26**, 189-209. http://dx.doi.org/10.1016/s0305-0548(98)00058-6

[26] Ridley, A.D. (1995) Combining Global Antithetic Forecasts.* International Transactions in Operational Research*,
**4**,387-398. http://dx.doi.org/10.1111/j.1475-3995.1995.tb00030.x

[27] Ridley, A.D. (1997) Optimal Weights for Combining Antithetic Forecasts.* Computers & Industrial Engineering*, **2**,
371-381. http://dx.doi.org/10.1016/s0360-8352(96)00296-3

[28] Ridley, A.D. and Ngnepieba, P. (2014) Antithetic Time Series Analysis and the CompanyX Data.* Journal of the Royal *
*Statistical Society*: *Series A*, **177**, 83-94. http://dx.doi.org/10.1111/j.1467-985x.2012.12001.x

[29] Ridley, A.D., Ngnepieba, P. and Duke, D. (2013) Parameter Optimization for Combining Lognormal Antithetic Time
Series.* European Journal of Mathematical Sciences*, **2**, 235-245.

[30] MATLAB (2008)Application Program Interface Reference, Version 8. The Math Works, Inc.

[31] Hogg, R.V. and Ledolter, J. (2010) Applied Statistics for Engineers and Physical Scientists. 3rd Edition, Prentice Hall, Upper Saddle River, 174.

[32] Box, G.E.P. and Cox, D.R. (1964) An Analysis of Transformations.* Journal of the Royal Statistical Society*: *Series B*,
**26**, 211-252.

[33] Abramowitz, M. and Stegun, I.A. (1964)Handbook of Mathematical Functions. Dover Publications, New York, 260 p.

[34] Bernado, J.M. (1976) Algorithm AS 103: Psi (Digamma) Function.* Journal of the Royal Statistical Society*:* Series C *

(*Applied Statistics*), **25**, 315-317.

**Appendix **

**Appendix A: **

**p**

**p**

**th Order Moment for the Gamma Distribution **

The *p*th moment of the gamma distribution is derived as follows:

### ( )

1### ( )

10 0

1 1

e d e d .

*x* *x*

*p* *p* *p*

*t*

*X* *x* _{α} *x*α β *x* _{α} *x*α β *x*

β α β α

− −

∞ _{−} ∞ _{+ −}

= =

### ∫

_{Γ}

_{Γ}

### ∫

(A.1)

Multiplying and dividing by

### β

α+*p*Γ

### (

### α

+*p*

### )

, Equation (A.1) becomes### (

### )

### ( )

### (

### )

### (

### )

### ( )

### (

### )

1 0 1 0 1 e d 1e d .
*x*
*p*
*p* *p*
*t* *p*
*x*
*p* *p*
*p*
*p*

*X* *x* *x*

*p*
*p*
*x* *x*
*p*
α
α β
α α
α β
α
β α

β α β α

α β

α β α

+ _{−}
∞ _{+ −}
+
−
∞ _{+ −}
+
Γ +
=

_{Γ} _{Γ} _{+}

Γ +

=

Γ Γ +

### ∫

### ∫

(A.2)

Since

### (

### )

1### (

### )

1

e ; ,

*x*
*p*

*p* *x* *f x* *p*

*p*

α β

α

### α

### β

### β

### α

− + −

+ _{Γ} _{+} = + , is the pdf for a gamma function with the parameter α+*p*,

### (

### )

10

1

e d 1,
*x*
*p*

*p* *x* *x*

*p*
α β
α
β α
−
∞ _{+ −}

+ _{Γ} _{+} =

### ∫

and equation (A.2) becomes

### (

### )

### ( )

.*p* *p*

*t*

*p*

*X*

### β

### α

### α

Γ +

=

_{Γ}

**Appendix B: Proof of the Antithetic Gamma Variables Theorem **

By applying the Taylor expansion around *p*=0 to Equation (8) we have

### (

### )

### ( )

### ( )

2### ( )

### ( )

3, 2

*p*

*p* *p* *O p*

α α ′ α ′′ α

Γ + = Γ + Γ + Γ + (B.1)

where Γ′, Γ′′ are the first and second derivatives of

### Γ

and*O p*

### ( )

3 represents the remainder.### (

### )

### (

### )

### (

### ( )

### )

### (

### ( )

### )

### ( ) ( )

### ( ) ( )

### ( ) ( )

### ( )

2 2 _{2} 2

2 3 3

2

*p* *p* *p*

*p* *p* *O p*

### α

### α

### α

### α

### α

### α

### α

### α

### α

′ ′

Γ + = Γ + Γ + Γ Γ

′′ ′ ′′

+ Γ Γ + Γ Γ + (B.2)

### ( ) (

### )

### (

### ( )

### )

2### ( ) ( )

_{2}

### ( ) ( )

### ( )

_{3}

2*p* 2*p* 2*p* *O p*

α α α α ′ α α ′′ α

Γ Γ + = Γ + Γ Γ + Γ Γ + (B.3)

The combination of equations (B.1)-(B.3) reduces Equation (8) to

### ( )

### ( )

### ( )

### ( )

### ( ) ( )

### (

### ( )

### )

### ( ) ( )

### ( )

### (

### )

### {

### }

### ( )

### ( )

### ( )

### ( )

### ( ) ( )

### (

### ( )

### )

### ( ) ( )

### ( )

### (

### )

### {

### }

2 3 1 2 2 2 2 3 1 2 2 2 2 .*p*

*p* *p* *O p*

*p* *p* *O p*

*p*

*p* *p* *O p*

*p* *p* *O p*

α α α

ρ

α α α α α α

α α α

α α α α α α

_{′} _{′′}

Γ + Γ + Γ +

=

′′ ′ ′ ′′

Γ Γ − Γ − Γ Γ +

_{′} _{′′}

Γ + Γ + Γ +

=

′′ ′ ′ ′′

Γ Γ − Γ − Γ Γ +

### ( )

### ( ) ( )

### (

### ( )

### )

### (

### )

### {

### }

10
2 2
lim .
*p*
α
ρ

α α α α

−

→

Γ = −

′′ ′

Γ Γ − Γ

(B.4)

By using the polygamma function (see Abramowitz and Stegun [33])

( )

_{( )}

### ( )

### ( )

### ( ) ( )

_{(}

_{( )}

_{)}

### (

### ( )

### )

2 1 2 d , dα α α

α α

α α _{α}

′′ ′

Γ Γ − Γ

′ Γ

Ψ = =

Γ _{Γ}

Equation (B.4) is transformed into

( )

_{( )}

### {

### }

10
1 2
1
lim .
*p*

### ρ

### α

### α

− → = − Ψ (B.5)The digamma function for real α>0, as

### α

→ ∞ is### ( )

### ( )

2 4 6 81 1 1 1 1

log

2 12 120 252 *O*

α α

α α α α α

_{ }

Ψ = − − + − + _{}_{ } _{}

(see also Bernado [34]).

Its derivative is the polygamma function

( )1

_{( )}

9
2 3 5 7

1 1 1 1 1 1

.

2 6 30 42 *O*

α

α α α α α α

_{ }

Ψ = + + − + + _{}_{ } _{}

And,

( )1

_{( )}

8
2 4 6

1 1 1 1 1

1 .

2 6 30 42 *O*

α α

α α α α α

_{ }

Ψ = + + − + + _{}_{ } _{}

From which, lim ( )1

### ( )

1α→∞αΨ α = . Finally, the limit in Equation (B.5) is

( )

_{( )}

### {

### }

10 , 0

1 2

1

lim lim lim lim 1.

*p* α α *p* α

ρ ρ
α α
− _{→∞} − _{→∞}
→ →∞ →
= = − = −
_{} _{}
Ψ

**Appendix C: Inverse Correlation and Bias Elimination **

Consider a gamma distributed time series *X _{t}* with a large shape parameter from which

*x*are observations. We have shown that for very small negative

_{t}*p*,

*x*and

_{t}*p*

*t*

*x* are nearly perfectly correlated, albeit negatively, so
we can express *p*

*t*

*x* in the original units of *x _{t}*, by means of the linear regression of

*x*on

_{t}*p*

*t*

*x* as follows:

0 1 ,

*p*

*t* *t* *t*

*x* =*c* +*c x* +ε
where ε* _{t}* is an error term.

As *p* approached zero from the left, near perfect correlation between *x _{t}* and

*p*

*t*

*x* ensures that the error term
becomes negligible, and a near perfect estimate is obtained from

0 1 .

*p*

*t* *t*

*x*′ = +*c* *c x* (C.1)

Now, suppose that

### ( )

1 , 2, , ,*t* *t* *t*

*x* = *f x*_{−} + *t*= *n*

biased.

To remove this bias, we power transform *x*ˆ* _{t}* to obtain ˆ

*p*

*t*

*x* . Then, we use Equation (C.1) to convert ˆ*p*
*t*
*x*
back to the original units of *x*ˆ* _{t}*. Hence

0 1

ˆ ˆ ˆ ˆ ,*p*

*t* *t*

*x*′ = +*c* *c x*

where *c*ˆ_{0} and *c*ˆ_{1} are least squares estimates obtained from the regression of *x*ˆ* _{t}* on ˆ

*p*

*t*

*x* , and the error
approaches 0.

### (

### )

1 1 1

ˆ*t* ˆ ˆ *p* ˆ ˆ*tp* ˆ ˆ*tp* ˆ*p* .

*x*′ = −*x* *c x* +*c x* = +*x* *c x* −*x*

Denoting sample standard deviation by *s* and correlation coefficient by

ˆ ˆ

ˆ _{p}*xx*

ρ ,

### (

ˆ### )

### (

### )

ˆ ˆ ˆ ˆ

ˆ *p* *p* ˆ ˆ

*p* *p*

*t* _{xx}*x* _{x}*t*

*x*′ = +*x* ρ *s* *s* *x* −*x*

(see also the Ridley [25] antithetic fitted function theorem).

Both estimates *x*ˆ* _{t}* and

*x*ˆ

*′ contain errors. These errors contain two components. One component is purely random and one component is bias. Combining the estimates dynamically cancels the bias components, leaving only the purely random components. See Appendix D for the proof of how this can occur. The combining weights discussed in Appendix D are theoretical, expressed in terms of errors that are unknown and un- observable, so we must rely on the approximation as follows. The combined estimate*

_{t}*x*ˆ

_{c t}_{,}is obtained from

### (

### )

,

ˆ* _{c t}* ˆ

*1 ˆ*

_{t}*,*

_{t}*x*=ω

*x*+ −ω

*x*′

where −∞ < < ∞

### ω

, and the value of### ω

is chosen so as to minimize the mse### (

### )

2, 2

ˆ
.
1
*n*

*t* *c t*
*t*

*x* *x*
*n*

=

−

−

### ∑

Consider the error in *x*ˆ_{c t}_{,} , *e*ˆ* _{t}*=

*x*ˆ

_{c t}_{,}−

*x*. Then 2

_{t}### {

### (

### )

### }

22ˆ 2 ˆ 1 ˆ

*n* *n*

*t* *t* *t* *t*

*t*=*e* = *t*= ω*x* + −ω *x*′−*x*

### ∑

### ∑

. Differentiating withrespect to

### ω

, 2### {

### (

### )

### }

### (

### )

2ˆ 2 ˆ ˆ ˆ ˆ ˆ

d *n* * _{t}* d

*n*2

_{t}

_{t}

_{t}

_{t}

_{t}*.*

_{t}*t*=*e* ω= *t*= ω *x* −*x*′ + −*x*′ *x* *x* −*x*′

### ∑

### ∑

Setting the derivative to zero and solving for### ω

,### ω

=### ∑

*nt*=

_{2}

### (

*xt*−

*x*ˆ′

*t*

### )(

*x*ˆ

*t*−

*x*ˆ

*t*′

### )

### ∑

*tn*=

_{2}

### (

*x*ˆ

*t*−

*x*ˆ

*t*′

### )

2

. This optimal ω yields the minimum mse, because

### (

### )

22 2 2

2ˆ 2 ˆ ˆ

d

### ∑

*tn*=

*et*d

### ω

=2### ∑

*tn*=

*xt*−

*xt*′ =0 iff

*x*ˆ

*t*′ = ∀

*x*ˆ

*t*

*t*, in which case ω=1, and

2 2 2

2ˆ

d

### ∑

*nt*=

*et*d

### ω

>0 otherwise. The steps for obtaining the combined antithetic fitted values are outlined as follows:**Step 1: **Estimate the model parameters and fitted values *x*ˆ* _{t}* =

*f x*

### ( )

_{t}_{−}

_{1},

*t*=2, 3,, .

*n*

**Step 2:**Set

*p*= −0.001.

**Step 3: **Calculate

### ω

=### ∑

*tn*=

_{2}

### (

*xt*−

*x*ˆ

*t*′

### )(

*x*ˆ

*t*−

*x*ˆ

*t*′

### )

### ∑

*nt*=

_{2}

### (

*x*ˆ

*t*−

*x*ˆ

*t*′

### )

2.**Step 4: **Calculate _{,}

### (

### )

### (

### (

_{ˆ}

### )

### (

### )

### )

ˆ ˆ ˆ ˆ

ˆ ˆ 1 *p* *p* ˆ ˆ , 2, 3, , .

*p* *p*

*c t* *t* _{xx}*x* _{x}*t*

*x* =ω*x* + −ω *x*+ρ *s* *s* *x* −*x* *t*= *n*

Likewise, the unbiased combined estimate of a future value at time

### τ

is obtained from### ( )

### ( ) (

### ) ( )

,

ˆ* _{c n}* ˆ

*1 ˆ*

_{n}*.*

_{n}*x*τ =ω

*x*τ + −ω

*x*′ τ

**Appendix D: Antithetic Fitted Error Variance Reduction **

Consider a gamma distributed time series *X _{t}* with a large shape parameter. Next, consider a minimum mean
square error fitted value obtained from a stationary first-autoregressive process

*X*=µ

_{t}*+φ*

_{X}### (

*X*

_{t}_{−}

_{1}−µ

_{X}### )

+*, given by*

_{t}*X*ˆ

*=µˆ*

_{t}*+φˆ*

_{X}### (

*X*−

_{t}_{1}−µˆ

_{X}### )

where µ*X*=

### [ ]

*Xt*and µˆ

*X*and ˆφ are least-squares estimates of µ

and φ, respectively, such that

### (

### )(

### )

### (

### )

21 1

2 2

ˆ *n* *n* _{,}

*t* *X* *t* *X* *t* *X*

*t* *t*

*X* *X* *X*

φ µ _{−} µ _{−} µ

= =

=

### ∑

− −### ∑

−### (

### )

### {

### }

### (

### )

### (

### )

21 1 1

2 2

ˆ *n* *n* _{,}

*X* *t* *X* *t* *X* *t* *X* *t* *X*

*t* *t*

*X* *X* *X*

φ µ φ − µ µ − µ − µ

= =

### (

### )

2### (

### )

### (

### )

21 1 1

2 2 2

ˆ *n* *n* *n* _{,}

*t* *X* *t* *t* *X* *t* *X*

*t* *t* *t*

*X* *X* *X*

φ φ _{−} µ _{−} µ _{−} µ

= = =

=_{} − + − _{} −

### ∑

### ∑

### ∑

### (

### )

### (

### )

21 1

2 2

ˆ *n* *n* _{.}

*t* *t* *X* *t* *X*

*t* *t*

*X* *X*

φ φ − µ − µ

= =

= +

### ∑

−### ∑

−Therefore, as *n*→ ∞, and since *X _{t}* is stationary so that Var

### ( )

*X*=Var

_{t}### (

*X*

_{t}_{−}

_{1}

### )

, and since the errors are serially correlated so that Cov### (

*t*,

*Xt*−

_{1}

### )

≠0,### (

1### )

### ( )

ˆ _{Cov} _{,} _{Var}

*t* *Xt* *Xt*

### φ φ

= + − (D.1)(see also Fuller [35], p. 404). Consider ˆ*X _{t}* as an estimate of

*X*. From (D.1), and given that the time series is stationary, then as

_{t}*n*→ ∞ and µˆ

*→µ*

_{X}*,*

_{X}### (

1### )

### {

### (

1### )

### ( )

### }

### (

1### )

### [ ]

ˆ _{Cov} _{,} _{Var} _{,}

*t* *X* *t* *X* *t* *t* *t* *t* *X* *t* *t*

*X* =

### µ

+### φ

*X*− −

### µ

+ *X*−

*X*

*X*− −

### µ

=*X*+

*u*

where

### (

### )

### ( )

### {

Cov , 1 Var### }

### (

1### )

*t* *t* *t* *t* *t* *X*

*u* = *X*_{−} *X* *X*_{−} −

### µ

(D.2) due only to errors resulting from serial correlation. Therefore,### ( )

### {

### (

### )

### ( )

### }

2### ( )

1

Var *u _{t}* = Cov

*,*

_{t}*X*− Var

_{t}*X*Var

_{t}*X*. (D.3)

_{t}Next, consider another fitted value ˆ*X _{t}*′, obtained from the linear projection of the asymptotically antithetic
series

*p*

*t*

*X* on *X _{t}*, without the introduction of any new error,

### [ ]

### {

### ( )

### ( )

### }

1 2### {

### }

0, 0

ˆ _{lim} _{Var} _{Var} *p* ˆ*p* *p* _{.}

*t* *t* *t* *t* *t* *t*

*p*

*X* *X* *X* *X* *X* *X*

σ_{→} _{→}−

### ρ

′ = + _{− }

Substituting for ˆ*X _{t}*

### [ ]

### {

### ( )

### ( )

### }

## {

### (

### (

### )

### )

## }

### [ ]

1 2 1 0, 0 ˆˆ _{lim} _{Var} _{Var} _{ˆ} _{ˆ}

,

*p*

*p* *p*

*t* *t* *t* *t* *X* *t* *X* *t*

*p*

*t* *t*

*X* *X* *X* *X* *X* *X*

*X* *u*

σ_{→} _{→}− ρ µ φ − µ

′ = + + − _{− }
′
= +
(D.4)

where ρ is the correlation between *X _{t}* and

*p*

*t*

*X*, and

### ( )

### ( )

### {

### }

1 2## {

### (

### (

### )

### )

## }

1 0, 0

ˆ

ˆ ˆ

lim Var Var *p* *p* *p*

*t* *t* *t* *X* *t* *X* *t*

*p*

*u* *X* *X* *X* *X*

σ_{→} _{→}−ρ µ φ − µ

′ = + − _{− }_{} (D.5)

is the antithetic error due to the serial correlation, but corresponding to ˆ*X _{t}*′.

The expansion of

### (

µˆ*+φˆ*

_{X}### (

*X*

_{t}_{−}

_{1}−µˆ

_{X}### )

### ) (

*p*= µˆ

*+φˆ*

_{X}*X*

_{t}_{−}

_{1}−φµˆˆ

_{X}### )

*p*will

contain the constant

### (

µˆ*−φµˆˆ*

_{X}

_{X}### )

*p*, the product of

*p*and some function of

*X*−

_{t}_{1}and

### (

ˆ 1### )

*p*

*t*

*X*

φ _{−} as follows:

### (

ˆ ˆ 1 ˆˆ### ) (

ˆ ˆˆ### )

### (

1### )

ˆ 1### (

ˆ ˆˆ### ) (

ˆ 1### )

*p* *p* *p* *p*

*p* *p*

*X* *Xt* *X* *X* *X* *pf Xt* *Xt* *X* *X* *Xt*

µ +φ − −φµ = µ −φµ + − +φ − → µ −φµ + φ − as *p*→0. Substituting

into Equation (D.5),

### ( )

### ( )

### {

### }

1 2## {

### (

### )

## }

1 0, 0

ˆ ˆ

ˆ ˆ

lim Var Var *p* *p* *p* *p* *p* .

*t* *t* *t* *X* *X* *t* *t*

*p*

*u* *X* *X* *X* *X*

σ_{→} _{→}−ρ µ φµ φ −

′ = − + _{− }_{} (D.6)

Now

### (

### )

### {

### }

2### ( ) (

### )

2### ( )

### (

### )

### (

### )

Var ω*u _{t}* + −1 ω

*u*′ =ω Var

_{t}*u*+ −1 ω Var

_{t}*u*′ +2ω 1−ω Cov

_{t}*u u*,

_{t}*′ .*

_{t}Substituting from Equation (D.2) and (D.6) and since µˆ* _{X}* and ˆφ are fixed for the data and model,

### (

### )

### {

### }

### ( ) (

### )

### {

### ( )

### ( )

### }

### ( )

### (

### )

### {

### (

### )

### ( )

### }

### ( )

### ( )

### {

### }

### (

### )

2 2 2 0, 0 2 1 1 1 2 1 1Var 1 = lim Var 1 Var Var

ˆ Var 2 1 Cov , Var

ˆ

Var Var Cov , .

*p*

*t* *t* *t* *t* *t*

*p*

*p* *p*

*t* *t* *t* *t*

*p* *p* *p*

*t* *t* *t* *t*

*u* *u* *u* *X* *X*

*X* *X* *X*

*X* *X* *X* *X*

σ

ω ω ω ω ρ

φ ω ω ρ

Substituting for Cov

### (

_{1},

*p*

_{1}

### )

*t* *t*

*X*_{−} *X*_{−} and Var

### (

*X*

_{t}_{−}

_{1}

### )

### (

### )

### {

### }

### ( ) (

### )

### ( )

### (

### )

### {

### (

### )

### ( )

### }

### {

### ( )

### ( )

### }

### {

### ( )

### ( )

### }

2

2 2 2

0, 0

1 2 1 2

1

ˆ

Var 1 lim Var 1 Var

ˆ

2 1 Cov , Var Var Var Var Var .

*p*

*t* *t* *t* *t*

*p*

*p* *p* *p*

*t* *t* *t* *t* *t* *t* *t*

*u* *u* *u* *X*

*X* *X* *X* *X* *X* *X*

σ

ω ω ω ω ρ φ

ω ω ρ φ ρ

−

→ →

−

′

+ − = + −

+ −

Substituting from (D.3) and factoring out Var

### ( )

*X*

_{t}### (

### )

### {

### }

### {

### (

### )

### ( )

### }

### (

### )

### (

### )

### {

### (

### )

### ( )

### }

### ( )

2 2

2 2 2

1 0, 0

2 1

ˆ

Var 1 lim Cov , Var 1

ˆ

2 1 Cov , Var Var

*p*

*t* *t* *t* *t* *t*

*p*

*p*

*t* *t* *t* *t*

*u* *u* *X* *X*

*X* *X* *X*

σ

ω ω ω ω ρ φ

ω ω ρ φ

− −

→ →

−

′

+ − = _{} + −

+ − _{}

and since ρ2→1 (see Appendix B), and ˆφ*p* and φˆ2*p* →1 as *p*→0−, then

### (

### )

### {

### }

### {

### (

### )

### ( )

### }

2### ( )

1

Var ω*ut*+ −1 ω *u*′*t* = ωCov *t*,*Xt*− Var *Xt* + −1 ω Var *Xt* from which we see that there are many
ways in which the combined error variance can be less than the original error variance in Equation (D.3). In

particular when ω

### {

1 Cov### (

*t*,

*Xt*

_{1}

### )

Var### ( )

*Xt*