SUMMARY OBSERVATIONS AND FORMAT OF THIS BOOK
N S Consider the separate mar-
kets for two types of players: grinders (low skill) and stars (high skill). The equilibrium wage for each type of player is determined within each submarket. Because they generate less revenue, the demand for grinders is lower than for stars. Since it requires less skill to be a grinder, we can also assume that the supply is greater at each wage than for high- skilled players. The combina- tion of lower demand and higher supply results in a lower wage for grinders (W G ) than for stars (W S ).
FIGURE 1.A1 Supply and Demand and the Relative Earnings of “Star” versus “Grinder” Hockey Players Salary , $US millions, 2010–11 Goals in 2009–10 0 10 20 30 40 50 0 2 4 6 8 10
This is a scatter plot of player salaries in 2010–11 against their goals scored the previous year (the 2009–10 season). Each point corresponds to a specific player, relating the number of goals he scored to his salary.
NOTE: Sample includes only players (forwards) who played at least 20 games in the 2009–10 season. Sample size is 369.
SOURCE: Based on data provided by the NHLPA Web site www.nhlpa.com , accessed January 2011.
FIGURE 1.A2 NHL Player Salaries by Goals Scored in the Previous Year, 2010–11
ben40208_ch01_001-030.indd 22
23
CHAPTER 1: Introduction to Labour Market Economics
goal scoring. An “eyeball” summary of this graph suggests that players who score more goals also make more money. Regression analysis is little more than a formalization of this data summary exercise.
In Figure 1.A3 , we show a plot of the same data, except that we transform the salaries into “ln(Salary).”Why? It is common practice in labour economics to use logarithms, and some of the reasons are outlined in Exhibit 1A.1 and later in Chapter 9. For our purposes here, the primary reason is that it is easier to summarize the relationship between log salaries and goals with a straight line than the raw levels of salaries. This is especially true given the range of salaries paid to players (from $500,000 to $10,000,000).
One simple way to more formally summarize the data in Figures 1.A2 and 1.A3 is to array the average salaries by the number of goals scored. Table 1.A1 illustrates the results of this exercise. Here we can see that player salaries rise from $0.95 million to $2.34 mil- lion when “Goals Scored” moves from 0–9 goals per year to 10–19 goals per year, an increase of approximately $1,400,000 for the extra 10 goals. This implies a return of about $1,400,000/10 5 $140,000 per goal. The return for scoring goals increases as players move to the other categories: average salaries rise by $1.6 million for moving from the 10–19 to the 20–29 goals per year category, by almost $1 million for moving from the 20–29 to the 30–39 category, and by $1.7 million for moving up to the 40–49 goals per year category. It would seem that player salaries decline in moving from the 40–49 category to the 50-plus category. As can be seen in Figure 1.A2 , however, this is driven by a single “outlier.” The salaries of the two highest goal scorers (Alexander Ovechkin and Sidney Crosby) are identical, at $9 million, and higher than the lower goal scorers. But Steven Stamkos scored a similar 51 goals
Log Salary , $US millions, 2010–11 Goals in 2009–10 0 10 20 30 40 50 13 14 15 16
17 Log Salary Predicted Log Salary
NOTES:
1. Sample includes only players (forwards) who played at least 20 games in the 2009–10 season. Sample size is 369. 2. The predicted log salary is based on the regression results reported in Table 1.A2 , where Log salary 5 13.450 1
0.062 3 Goals.
SOURCE: Based on data prov ided by the NHLPA Web site, www.nhlpa.com , accessed January 2011.
FIGURE 1.A3 Logarithm of NHL Player Salaries by Goals Scored in the Previous Year, and Fitted Regression Line, 2010–11
This is a scatter plot of the logarithm of player salaries in 2010–11 against their goals scored the previous year (the 2009–10 season). Each point corresponds to a specific player, relating the logarithm of his 2010–11 salary to his goals scored. In addition, the estimated regression line, based on a regression of log salary on goals scored, is also pre- sented on this graph.
ben40208_ch01_001-030.indd 23
24 Labour Market Economics
in 2009–10, earning a salary of only $875,000 from the Tampa Bay Lightening in 2010–11. However, he was only in the second year of his three-year entry-level contract, and the salary for 2010–2011 does not include potential bonuses. In time, we expect his salary to adjust to his level of performance. In terms of logarithms, we see that the log salary rises by about 0.80 for 10 goals scored in the below-30-goals range, and about 0.30-0.40 per 10 goals as players move to the 50-goals-per-season mark.
A regression function calibrates more formally the relationship that was apparent in the table. Rather than letting our eyes fit a line through the data plotted in Figure 1.A2 and Figure 1.A3 , or approximating the relationship between two variables from a table, regres- sions are calculated to provide an estimate of the function that “best fits” the data. Imagine that we want to estimate the following relationship between salary and goals:
Salary 5 a 1 b 3 Goals
This is the equation for a straight line. In this example, salary is the dependent variable, while goals is the independent, or explanatory, variable. If we believed this were an exactly true and complete model of player salaries, all of the data would lie on the line implied by this equa- tion, and estimation would be very simple. Of course, we know that this is only an approxima- tion, and there are many other factors (leadership ability, “marketability,” past performance, long-term contracting issues, and luck) that affect player salaries. We can lump all of these factors together into an error term, e, so that our augmented model is
Salary 5 a 1 b 3 Goals 1 e
For any given line that we can draw through the points in the scatter plots, this “model” will fit perfectly. Individual player salaries will equal the “predicted” salary that is on the line, plus the difference between the actual salary and the predicted salary. However, we are interested in explaining as much of the salary as we can with goals and would like as little as possible left to the residual, e. While there are many ways to formalize this idea, in practice, most researchers choose the line that minimizes the sum of the (squared) residuals, or distance between the line and the actual observation. This ordinary least squares or OLS estimator yields a line that best predicts salary with goals scored, at least according to this criterion. The output of regres- sion analysis will be estimates for the parameters of this line: a, the intercept, and b, the slope.
The estimated regression line for these data is shown in Figure 1.A3 with log salary as the dependent variable. It is upward sloping, though the precise slope is hard to read directly
TABLE 1.A1 Tabulation of NHL Player Salaries by Goals Scored, 2010–11
Goals Scored,
2009–10 Sample Size
Salary
($US millions) In (Salary)
0–9 128 0.95 13.63 10–19 133 2.34 14.41 20–29 84 3.97 15.03 30–39 17 4.90 15.31 40–49 4 6.60 15.70 50 1 3 6.29 15.24 NOTES:
1. Each column indicates the sample size or sample average for the corresponding goal-scoring category. 2. Sample includes only players (forwards) who played at least 20 games in 2009–10. Sample size is 369.
SOURCE: Based on data provided by the NHLPA Web site, www.nhlpa.com, accessed January 2011.
ben40208_ch01_001-030.indd 24
25
CHAPTER 1: Introduction to Labour Market Economics
from the graph. The slope of the line will tell us (on average) how much player salaries rise with goals scored. Specifically, the estimated slope will allow us to conduct the “thought experiment” of how much extra a player would earn if he scored one more goal. The numeri- cal estimates of the regression are reported in Table 1.A2 . We show the estimated results for both levels (dollars) and logarithms. In column (1) for levels, we see that each additional goal is associated with $144,943 of additional earnings. In logarithms, column (1) suggests that each goal is associated with an additional 0.062 log dollars, which can be interpreted as an additional 6.2 percent in salary.
In addition to the coefficient , or parameter estimates, regression analysis and the associ- ated statistical theory yield estimates of the stability, or reliability, of the estimates, at least within the sample. The standard errors of each coefficient provide a measure of the preci- sion with which we are likely to have estimated the true parameter. If the estimation proce- dure were repeated on other samples of similar hockey players, the coefficients would likely vary across samples even if the underlying salary determination were the same, since no two samples are identical (because of e). The standard error is an estimate of the variability (due to sampling error , e) that we would expect for estimates of the coefficient across these samples. Coefficients and standard errors are used together to conduct tests of statistical hypotheses. The most common hypothesis we will encounter is whether a given coefficient is statistically significantly different from zero. Even if one variable has no effect on another, given sampling error it is unlikely that the estimated coefficient would equal zero exactly. For this reason, we generally pay attention only to coefficients that are at least twice as large as their standard errors. This is related to the common “disclaimer” in the reporting of poll results; that the polling procedure yields estimated percentages that are within a given range of the true percentage 19 times out of 20, so that we should take the specific reported value with a well-defined grain of salt. While this is an approximation of the basic, formal statistics underlying hypothesis testing, it should be sufficient for most empirical research discussed in the book.
TABLE 1.A2 Estimated Effects of Performance on Player Salary in 2010–11 (standard errors in parentheses)
Salary ($US) mean 5 2,422,666 ln (Salary) mean 5 14.34 Means (1) (2) (3) (4) (5) (6) Intercept 331,796 61,668 31,864 13.450 13.360 13.340 (141,586) (129,445) (137,831) (0.056) (0.052) (0.055) Goals 14.43 144,943 57,328 58,700 0.062 0.030 0.031 (8,121) (11,574) (11,785) (0.003) (0.005) (0.005) Assists 20.06 76,462 77,068 0.027 0.028 (7,873) (7,937) (0.003) (0.003) Plus/minus 0.43 25,006 20.003 (7,903) (0.003) R-squared 0.47 0.57 0.58 0.51 0.59 0.59
NOTE: Sample includes only players (forwards) who played at least 20 games in 2009–10. Sample size is 369.
SOURCE: Based on data provided by the NHLPA Web site, www.nhlpa.com, accessed January 2011.
ben40208_ch01_001-030.indd 25
26 Labour Market Economics
In labour economics, especially in the presentation of empirical results, it is common to encounter logarithms. Many students find logarithms (or “logs”) quite intimidat- ing, but, in fact, the use of logs often simplifies economic theory and empirical work. There are at least two (related) contexts in which you will encounter logs in labour economics.
1. First, logs are helpful in expressing elasticities. The elasticity of labour supply with respect to the wage expresses the effect of a change in the wage on hours of labour supply in proportional or percentage terms. This is usually more convenient than expressing the relationship in the original units of measurement. For example, we could state that a $1 per hour rise in the wage increases annual labour supply by 50 hours. In order to facilitate comparison across studies, however, it is usually preferable to report that a 10 percent increase in the wage leads to a 2.5 percent increase in labour supply; that is, that the elasticity is 0.25. The elasticity is unit free and gives the proportional (or percentage) change in hours associated with a proportional (or per- centage) change in the wage. What does this have to do with logs? The conventional formula for an elasticity is
E 5 DY 3 X 5 DY 4 DX
DX Y Y X
where “D” refers to the change in a variable. As it turns out, changes in the loga- rithms of variables directly yield the proportional changes, so that the elasticity can be expressed as
E 5 D ln Y/D ln X
This is especially convenient in estimating elasticities, because the estimated slope of a regression of ln Y on ln X gives a direct estimate of the elasticity.
2. Second, for many economic phenomena, changes in key variables lead to multipli- cative or proportional changes in others. For example, economic growth is usually expressed in percentage terms, rather than in so many billions of dollars. We usually hear that GDP grew by 2 percent, or some comparable figure. An economy with a con- stant growth rate will grow exponentially, as each year’s growth is added to the base of the next year, much as interest is added to the principal in a savings account with compound interest. An approximation of this growth process, or any similar relationship between a variable y and a variable x, is
y 5 AerX
This equation is a “simple” representation of a variable y that grows continuously at rate r with X.
In labour economics, many productive characteristics are best described as having a multiplicative effect on wages. For example, an additional year of education might raise wages by 10 percent. Each additional year of education would then raise wages by 10 percent as much. In that case, wages can be expressed as a function of years
Natural Logging
EXHIBIT 1A.1
ben40208_ch01_001-030.indd 26
27
CHAPTER 1: Introduction to Labour Market Economics
of schooling, as in the equation above. The figure illustrates the relationship between the level of wages and years of schooling for a simple example, where the base wage is $6 per hour and the return to a year of schooling is 10 percent. Also shown in this graph is a plot of the log wage against the years of schooling. Notice that, unlike the relationship in levels, the logarithmic relationship yields an exact straight line. Again, for the purposes of estimating regressions, this is the preferred linear specification of the relationship between wages and education. Transforming such variables into (natural) logs is thus a common exercise.
Useful Properties of Logs
ln e 5 1 ln (a/b) 5 ln a 2 ln b ln (Ae rx ) 5 ln A 1 rx ln ab 5 ln a 1 ln b ln a x 5 x ln a ln (1 1 r) < r
A Comparison of W and ln(W) as Functions of Schooling
5 1.5 2.0 2.5 3.0 3.5 10 15 20 25 30 0 2 4 6 8 10 12 14 16 Ln W age W ag e in le ve ls Years of schooling Wage ($) (Y1) ln Wage ($) (Y2)
NOTES: This line shows a hypothetical relationship between wages and the number of years of schooling. In this example, we assume that the logarithm of wages is a linear function of schooling. The dashed line corresponds to the log wage, and the second (right-hand) axis provides the relevant scale. While the log wage is linearly related to the level of schooling, the wage (in levels) is nonlinear, as can be seen by the solid line and the relevant first (left-hand) vertical axis.
The ratio of the coefficient to its standard error is called the t-ratio , and the “test” for statistical significance is basically a determination of whether this ratio exceeds 2 in absolute value. In the example reported in Table 1.A2 , the t-value for goals scored is much greater than 2 (e.g., $144,943/8,121 5 17.85), so we can reject the possibility that the estimated relation- ship between goals and salaries is due to pure chance or sampling error.
ben40208_ch01_001-030.indd 27
28 Labour Market Economics
Another statistic commonly reported in regression analysis is the R-squared . We do not focus much on R-squareds in this text, but the R-squared is a measure of the goodness of fit of the regression. It indicates the proportion of the variance of the dependent variable that is accounted for or explained by the explanatory variables. In column (1) of Table 1.A2 , the R-squared is 0.47, which indicates that goals scored in the previous year account for 47 per- cent of the variation of player salaries. Sadly, this is probably as good a fit of regression as we will encounter in labour economics!
Of course, we do not believe that this simple one-variable regression could possibly explain all of the variation in player salaries. There are other aspects of hockey performance that we also expect to contribute to team victories and revenue. For example, defensive play or setting up goals for other players (assists) may also be rewarded. These other fac- tors can easily be accommodated in multiple regression analysis, which is nothing more than adding more variables to the regression equation. At one level, the effect of additional variables, like assists, is of independent interest, and for this reason the expanded set of regressors may be desirable. However, there is a more important reason to add additional variables.
Our objective is to estimate the additional earnings a player can expect by scoring another goal. We accomplish this by comparing earnings across players who have scored different numbers of goals. We, thus, attribute the additional earnings a 30-goal scorer earns compared to a 10-goal scorer entirely to the extra 20 goals scored. What if the 30- and the 10-goal scorer differ in other ways? What if the 30-goal scorer is also a better player overall (in other dimensions)? We may then over-attribute the salary difference to the goal scoring alone. What we really want to do is compare otherwise identical players who differ only in the number of goals scored. We cannot perfectly do this, but by including additional explanatory variables, we can control for, or hold constant, these other factors. Of course, we can hold constant only observable, measured factors, and there will always exist the pos- sibility that any estimated relationship between two variables is purely spurious, with one variable merely reflecting the effects of unobserved variables with which it is correlated. We will encounter many such problems as we evaluate empirical research in labour economics. For example, do highly educated individuals earn more because they have the extra school- ing, or is schooling merely an indirect indicator of their inherent ability and earnings capac- ity? Do union members earn higher wages because of the efforts of the union or because unionized firms are more selective in hiring more productive workers? We must always be careful in noting that regression coefficients indicate correlation between variables. As tempting as it may be, it may be very misleading to apply a causal interpretation to this association.
In columns (2) and (5) we add the variable “Assists” to the regression to see whether goals scored alone was capturing some of the effects of omitted player quality. Apparently this was the case. The estimated return to scoring a goal drops by more than half when we control for the number of assists. Players who score more goals also tend to perform well in other dimen- sions, like assisting other players to score goals. In fact, assists are rewarded slightly more than goals. 4 In column (2), the economic return to scoring a goal for a player with a given level of
assists is $57,328. In this way, the multiple regression framework holds constant player ability in one dimension to see how variation in another is associated with player salary. Similarly,
4As it turns out, we cannot reject the hypothesis that the coefficients on goals and assists are the same; that is, given the sampling error, it is possible that the estimated coefficients on goals and assists were generated by a model where goals and assists have the same returns.
ben40208_ch01_001-030.indd 28
29
CHAPTER 1: Introduction to Labour Market Economics
for players scoring the same number of goals, an extra assist is associated with $76,462 in additional salary.
Finally, in order to see whether forwards are compensated for their defensive abilities, we include a measure of the players’ “plus-minus” (the difference between team goals scored for and against while a player is on the ice—a large “minus” for a player indicates that even though he may score goals, the opposing team often scores when he is playing). In columns (3) and (6) we see that the estimated coefficients on “plus-minus” are small and that includ-