Data Snooping and Market-Timing Rule Performance
2.2 Testing Procedures: The “Reality Check” and the SPA Test
The dangers of data snooping have long been recognized as a serious problem of empirical studies in finance (see, for instance, Lo and MacKinlay (1990), Brock, Lakonishok, and LeBaron (1992), and Ferson, Sarkissian, and Simin (2003)). As we are investigating a large universe of market-timing rules, a robust methodology to avoid spurious statistical inference due to data snooping is needed. We therefore employ the “Reality Check” (RC), introduced by White (2000), and the test for superior predictive ability (SPA), introduced by Hansen (2005). Both procedures allow for an intensive search for models while ensuring that the results are robust and do not result from mere chance. Both procedures build on the work of Diebold and Mariano (1995) and West (1996). In this section, we briefly outline both testing procedures and refer the reader to the original articles for rigorous derivations.
The RC tests the null hypothesis that the best model does not have superior predictive ability over a benchmark model, while taking into account the full set of models, against the alternative that the best model does have superior predictive ability. The test is based on the l × 1 performance statistic,
f =¯ 1
where l is the number of market-timing rules and n is the number of prediction periods indexed from R through T , so that T = R + n − 1. ˆft+1 = f (Zt, ˆβt) is the performance measure, where Ztis a matrix which contains a vector of dependent variables and a vector
32
of predictor variables and ˆβtis a vector of estimated parameters. It is assumed that these parameters satisfy the conditions of Diebold and Mariano (1995) and West (1996), so that parameter uncertainty vanishes asymptotically. In our experiment, we consider various parameterizations of each trading rule (βk, k = 1, ..., l). The parameterizations directly produce returns, so there are no estimated parameters.
We use the returns generated from the l timing rules as a performance measure.7 For a timing rule k, we follow Sullivan, Timmermann, and White (1999) and specify fk,t+1 as
fk,t+1 = ln where Zt consists of the predictor variables (described in Section 2.3) and βk denotes the different parameterizations of the timing rules; subscript 0 refers to the benchmark model. St is the price of the S&P index at time t. Xk and X0 are “timing functions”
which take on the value 1 for “invest in the stock market” and 0 for “hold cash.” Based on this performance statistic, we test whether there is a timing rule that delivers superior performance over a simple buy-and-hold strategy, where X0 = 1 at all times. Formally, the null hypothesis is:
H0 : max
k=1,...,l{E(fk)} 6 0. (2.3)
If the null can be rejected, it has been established that a timing rule exists that outper-forms the benchmark. It has been shown by White (2000) that, under weak assumptions about the stationarity, dependence structure, and moments of ˆft, the distribution of the test statistic can be obtained by applying the stationary bootstrap of Politis and Romano (1994) as follows. In step 1, for each timing rule k = 1, ..., l, we generate a resample of {fk,t+1}t=Tt=R by drawing (geometrically distributed) blocks from the observed return series, with mean block length 1/q.8 We shall denote the resampled series by fk,t+1,j∗ , where subscript j indicates the j-th repetition of the bootstrap. We repeat the process J times. In step 2, we calculate the mean of the bootstrapped return series, f¯k,j∗ = n−1PT
7 We first outline the procedure for using raw returns, but we will also use the Sharpe ratio (SR) as a performance measure.
8 The choice of the block length is discussed in more detail in Section 2.7.4.
We then compare VRC with the quantiles of VRC,j∗ . A p-value of the RC is computed as
where1{·} denotes the indicator function. In our empirical analysis, we set J = 1000 and choose a smoothing parameter of q = 0.5.9
Hansen (2005)’s SPA is very similar to the RC, yet it includes some refinements that can improve the test power in most cases. The SPA makes use of the following studentized test statistic: Hansen (2005), who also suggests invoking a different null distribution based on N ( ˆµ, ˆΩ), where ˆΩ denotes a consistent estimate of the asymptotic covariance matrix of ¯f and ˆµ is an estimate for E(ft). Hansen (2005) advocates the use of the following estimator:
µk= ¯fk1{√
n( ¯fk/ˆσk)6−√
2 ln ln n}. (2.8)
By choosing this estimator, we make sure that irrelevant models do not asymptotically influence the distribution of the test statistic. This can be shown by applying the law of the iterated logarithm, which ensures that
√n ¯fk−µk
σk stays within certain bounds with probability 1 asymptotically.
The implementation of the SPA is also very similar to that of the RC. In step 1, for each timing rule k = 1, ..., l, we generate a resample of {fk,t+1}t=Tt=R by drawing (geometrically distributed) blocks from the observed return series. We shall denote the resampled series {fk,t+1,j∗ }, where subscript j indicates the j-th repetition of the bootstrap. In step 2, we calculate Zk,t+1,j∗ = fk,t+1,j∗ − ¯fk1{√ p-value of the SPA is given by
9 Results of a sensitivity analysis for changes in the smoothing parameter are shown in Section 2.7.4.
34
pSP A =
J
X
j=1
1{VSP A,j∗ >VSP A}
J . (2.11)
Throughout the paper, we refer to this p-value as the consistent SPA p-value (SPA-c). In addition, we calculate an inconsistent lower bound for the consistent SPA p-value, called SPA-l, which Hansen (2005) computes by replacing ¯Zk,j∗ in equation (2.10) with ¯Zk,jl∗ , the average of the bootstrapped series Zk,t+1,jl∗ = fk,t+1,j∗ − maxf¯k, 0.
For both testing procedures, we use the Sharpe ratio, in addition to raw returns, as a performance measure. If the Sharpe ratio is used as a performance measure, the null hypothesis is:
H0 : max
k=1,..,l{g(E(hk))} 6 g(E(h0)), (2.12)
where h is a 3 × 1 vector given by
h1k,t+1 = St+1− St
St Xk(Zt, βk), (2.13)
h2k,t+1 = St+1− St
St Xk(Zt, βk)
2
, (2.14)
h3k,t+1 = rt+1f ; (2.15)
rt+1f is the risk-free interest rate at time t + 1 and g(·) is given by
g(E(hk,t+1)) = E(h1k,t+1) − E(rt+1f ) q
E(h2k,t+1) − (E(h1k,t+1))2
. (2.16)
The expectations are estimated by the sample mean. We then construct the relevant statistic
f¯k= g(¯hk) − g(¯h0), (2.17)
where ¯hk and ¯h0 are arithmetic averages calculated over the sample period for trading rule k and the buy-and-hold strategy. The application of the bootstrap procedure to the difference of Sharpe ratios works similarly to the procedure described for the difference in raw returns (for details, see Sullivan, Timmermann, and White (1999), p. 1653).