This section summarizes simulation exercises to evaluate small sample biases in the estimators and their standard errors. We use bootstrap simulations, drawing samples of the same sizes as in our actual data. In each of 1000 draws, artificial data are
generated from a mixture of three fund distributions, determined by given values of the π fractions and alphas. A given draw is generated by resampling months at random
from the hedge fund data, where the relevant fractions of the funds have the associated alphas added to each of their returns, after the sample estimates of their alphas have been subtracted. We use the hedge fund data because, as Table 1 suggests, the departures from normality are likely to be greater for hedge funds than for mutual funds, providing a tougher test of the finite sample performance. Initially, the values of the π fractions are set to π0=0.10, πg=0.60 and πb=0.30, and the bad and good alphas are set
to -0.138 and +0.262% per month.15
For each draw of artificial data from the mixture with known parameters, we run the estimation by simulation, for a fixed value of the size of the tests, γ/2. The π
fractions are estimated in each of these trials from the three simulations as described earlier in the paper, each using 1000 artificial samples generated by resampling from the one draw from the mixture distribution, treating that draw the same way we treat the original sample. The α parameters are held fixed for these simulations. The Avg. estimates in Table 5 are the averages over the 1000 draws from the mixture distribution. The empirical SD’s are the standard deviations taken across the 1000 draws. The Root MSE’s are the square roots of the averages over the 1000 draws, of the squared
difference between an estimated and true parameter value.16
15 We experiment with setting all the π fractions to 1/3 and the results, shown in the Internet Appendix, are broadly similar. We also report experiments where we set the π fractions to the banner values reported by BSW: π0=0.75, πg=0.01 and πb=0.24, and find broadly similar results.
16 In the event that a parameter estimate is on the boundary of the parameter space (a π fraction is zero or 1.0), we drop the estimated standard error for that simulation trial for the calculations. This choice has only a small effect on the results.
Consistent with our analytical results, Table 5 shows that the BSW estimator of π0 is badly biased in favor of finding too many zero-alpha funds, and the estimators of the fractions of good and bad funds are biased toward zero. When 10% of the funds have zero alphas, the BSW estimates are 75-96%, depending on the size of the tests. The negative point estimate of πbwhich we observed in the sample is consistent with the
negative average value in the simulations, for the 5% and 10% test sizes. Our point estimates are also biased. Like the BSW estimates we find too many zero alpha funds and too few good and bad funds. But our point estimates are much less biased than the BSW estimators, finding for example, 20-42% zero alpha funds when the true fraction is 10%.
Our point estimates are typically within one empirical standard deviation of the true values of the π fractions at the 5% test size, while the BSW estimators are much further away. Our estimator is more accurate at the 10% size and slightly more accurate still at the 20% size, where the expected point estimate is within 0.3-0.6 standard errors of the true parameter value. Consistent with the analytical results, the BSW estimates are less biased at the larger test sizes, but they are still are badly biased. The point estimates remain more than six empirical standard errors away from the true values when the test size is 20%.
Our results indicate that smaller fractions of funds have zero alphas and larger fractions have nonzero alphas, compared with the evidence using the BSW
methodology. We also find that large fractions of the hedge funds are estimated to have either positive or negative alphas. The finite sample performance of the estimators
says that these claims are likely conservative. Given the biases in the estimators, the fractions of zero alpha funds are likely even smaller than we report above.
The BSW reported standard errors understate the sampling variability of their estimates for all test sizes. Even when γ/2=0.20 or 0.30 (Panels C and D), where they perform the best, the average BSW standard errors range from 20% to 60% of the
empirical standard errors.17 Our standard error estimates are also biased. When the test size is 5% they are too far small for π0 and too large for πg and πb by as much as a factor of two. When the size is 10% the standard errors are reasonably accurate for π0 but still too large for πg and πb by 50-100%.18
The empirical standard errors in Table 5 get smaller as the size of the tests is increased from 5% to 20%. The average empirical standard error at the 20% size is about 60% of the value at the 5% size. This is the opposite of the pattern in the average reported standard errors, which are larger at the larger test sizes. Given this tradeoff, the 10% test size appears to be the best choice if relying on reported standard errors. The point estimates are as accurate as at the larger test sizes. The reported standard errors for π0 are fairly accurate, and they are overstated for πg and πb, and thus conservative.
Finally, the simulations show that the BSW estimators display lower sampling variability than our estimators. This makes sense, given the relative simplicity of their estimators that results from setting the power to 100% and the confusion to zero.
17As a check, the reported standard error in BSW (Table II) for πb is 2.3%. Our simulations of the reported BSW standard errors, when γ/2=0.20, average 2.2%, suggesting that we are accurately representing their standard error calculations.
18We conduct some experiments where we set the correlation of the tests across funds to zero, as assumed by BSW, and we find that the standard errors are then an order of magnitude too small.
However, the BSW estimators concentrate around biased values. Comparing the root mean squared errors (RMSEs), they are substantially higher for the BSW estimators. When the test size is 5%, the RMSE of the BSW estimator is larger than for our estimator by 100-200%. As the test size is increased the RMSEs of both estimators is smaller, but the BSW estimator still has a larger RMSE. When the size is 20%, the BSW estimators RMSEs are larger than for our estimators by 150-300%.
Panel D of Table 5 takes the size of the tests to 30% in each tail. The average point estimates, RMSEs and empirical standard errors of our estimators are similar to those in Panel C, but the average reported standard errors are more overstated. The BSW point estimates have similar biases, and the BSW reported standard errors still average only 30-60% of the empirical standard errors. The RMSEs are slightly improved, but still larger than the RMSEs of our estimators, by 160-300%.
We conduct some experiments where we expand the number of time-series observations in the simulations to 5,000, in order to see which of the biases are finite sample issues, and which are inconsistencies. (These experiments are reported in the Internet Appendix.) The large-sample experiments suggest that the BSW standard error estimators are consistent. The experiments also suggest that the BSW point estimator of π0 is inconsistent. For example, the average estimated value is about 30% when the true value is 10%, and the expected estimate is 49% when the true value is 1/3. Our
estimates are much closer to the true values when T=5,000, suggesting that their biases in Table 5 are likely finite sample biases.