STATISTICAL SIGNIFICANCE
4.9 MARGIN OF ERROR AND ADJUSTED STATISTICS
Adjusted statistics, which allow us to account for the effects of playing conditions, oppo- nents, coaches’ decisions, and so on, play an important role in the analysis of sports data. However, it is important to keep in mind that all such adjustments are the results of statistical procedures, and although they often yield adjusted statistics that are improve- ments over the original ones, the process of adjustment generally introduces additional variability into the statistic, in the sense that the margin of error of the adjusted statistic is larger than that of the original, unadjusted statistic.
First consider “direct adjustment,” as discussed in Sections 3.11 and 3.12. This method applies to statistics that can be viewed as weighted averages over specific subclasses and proceeds by reweighting these subclass-specific statistics according to some standard weights. However, this method has the potential drawback of giving large weight to a statistic based on few data, leading to a large margin of error for the adjusted statistic.
Consider the case of Jason Hanson, kicker for the Detroit Lions in 2011. That year, he had 9 attempts in the 20- to 29-yard range, making all 9; 9 attempts in the 30- to 39-yard range, making 8; 4 attempts in the 40- to 49-yard range, making 2; and 7 attempts of 50 yards or greater, making 5. Overall, he made 24 field goals in 29 attempts, for a success proportion of 0.828. Using a simple binomial model for his field goal attempts, the margin of error of his success proportion is
− =
2 0.828 (1 0.828)
29 0.140;
using a more sophisticated approach that takes into account the fact that his success proportion is different in different ranges, the margin of error is 0.133.
Note that Hanson’s overall field goal success ratio can be written in terms of the distance-specific success ratios, together with the weights based on the number of attempts, as follows: = + + + 24 29 9 29 9 9 9 29 8 9 4 29 2 4 7 29 5 7 .
Therefore, the success ratio in the 40- to 49-yard range, which is based on only 4 attempts, naturally receives little weight.
Now, consider the adjusted field goal proportion, as presented in Section 3.12. For Hanson, this is
(
)
(
)
(
)
(
)
= + + + 0.778 0.291 9 9 0.265 8 9 0.306 2 4 0.138 5 7 .Note that the results from the 40- to 49-yard range, based on only 4 attempts, receives the largest weight in this computation. This leads to a relatively large margin of error of 0.175 for the adjusted statistic; this is about 1.25 times the margin of error for the origi- nal statistic. Therefore, although the adjustment for the distance of a kicker’s attempts makes the success proportions more directly comparable, it has the additional effect of introducing additional error into the comparisons.
Similar considerations apply to other types of adjustment. For instance, let Y denote some statistics, and suppose we observe Y0, the value of that statistic in a given year (“year 0”). Suppose that we want to adjust this statistic to obtain an “equivalent” value for a different year (“year 1”). Let Y1 denote the (unobserved) statistic for that player in year 1. To estimate Y1, we need to make some assumption about how Y0 and Y1 are related. For instance, we might assume that
= Y Y Y Y 0 0 1 1
where Y Y0, 1 denote the league averages of the statistic for years 0 and 1, respectively. Under this assumption, the ratio of a player’s statistic to the league average is constant over years. It follows that the adjusted value of Y , adjusted to year 1, is0
Y Y .Y
1 0
0
For instance, two of the highest values of OPS (on-base plus slugging) for a season are Bobby Bonds’s 1.422 in 2004 and Babe Ruth’s 1.379 in 1920. Direct comparison of these values is distorted by the differences in baseball in 1920 and 2004; hence, we might consider adjusting these values to some standard year, which we take to be 2012. In 2012, the MLB average for OPS is 0.724; in 2004, it is 0.763, and in 1920 it is 0.707. Therefore, Bonds’s 2004 OPS, adjusted to 2012, is
= 0.724
0.763 1.422 1.349;
Ruth’s 1920 OPS, adjusted to 2012, is =
0.724
0.707 1.379 1.412.
It follows that Ruth’s 1920 season is more impressive than Bonds’s 2004 season, at least in terms of OPS and the type of adjustment used here.
In this type of multiplicative adjustment, the adjusted statistic is simply the original statistic multiplied by a factor f. For concreteness, assume that f has the form
=
f Y
Y 1 ,
where Y ,0 and Y1 are the league averages of the statistic for years 0 and 1, respectively, as in the Ruth and Bonds example.
Because the adjusted statistic is simply the original statistic multiplied by f, the margin of error of the adjusted statistic is f times the margin of error of the original statistic plus an additional factor because the adjustment factor f is a function of data; hence, it has its own margin of error. This additional factor is small if the margins of error of Y Y0, 1 are small relative to the margin of error of Y0. For instance, if the num- ber of observations on which each of Y Y0, 1 is based is at least 20 times as great as the number of observations on which Y0 is based, then the contribution of this additional factor is generally less than 5%, which might be considered to be negligible. When Y0 is a statistic for a given player or team and Y Y0, 1 are the corresponding statistics for the league, this condition is usually satisfied.
The lesson here is that, when making adjustments to statistics, it is important for the adjustment factors to be accurately determined. To achieve this, it is tempting to use as many data as possible, perhaps including results for several years or leagues. Although this approach is often valid, it is important to keep in mind that, unless the relationships between the variables are constant for the data being used, it has the poten- tial of introducing additional error. Therefore, the analyst must balance the benefits of using more data to obtain more accuracy versus possible biases introduced by expand- ing the range of the analysis.