Throughout this book, we emphasize empirical evidence that pertains either to the theoretical model being discussed or the policy question being analyzed. Empirical labour economists address a variety of such questions, a small sample of which is listed below:
• Do labour supply curves slope upward? (Chapter 2)
• Does unemployment insurance reduce incentives to work, increasing the duration that someone is unemployed? (Chapter 3 and Chapter 17)
• Do minimum wages “kill” jobs? (Chapter 7)
• How much does education increase earnings? (Chapter 9)
• How much of the difference between men’s and women’s earnings can be attributed to labour market discrimination? (Chapter 12)
• Does increased immigration adversely affect the employment outcomes of the native born? (Chapter 11)
• Do higher wages adequately compensate individuals for working in unpleasant condi-tions? (Chapter 8)
• How much do unions affect the wage structure of the economy; that is, what is the pre-mium that workers receive for being organized in unions? (Chapter 15)
Sometimes the evidence is a simple graph, chart, or a table of means. More often in labour economics, because it is relationships between variables that are of direct interest, empirical evidence refers to results from regression analysis . Our objective here is not to dwell on the mathematical aspects of regression analysis; several helpful sources that do are listed in the References. Rather, our objective is to review some of the terminology and issues that will allow a more informed reading of the empirical evidence as it is described in the book. With modern computing capabilities, even the most basic of spreadsheets can estimate regressions, so we shall not discuss computational aspects of estimating regressions. Besides, the difficult part of regression analysis or empirical work is not usually the implementation of the estima-tion procedure, but rather obtaining informative data and choosing a reasonable regression to run in the first place. Our discussion in this brief appendix will therefore focus on how regres-sion results can be interpreted.
The foundation of all empirical work is data; that is, recorded measurements of the vari-ables we are trying to summarize or explain. Increasingly, access to data is becoming quite easy as data libraries and statistics bureaus cooperate in providing computer-readable files.
These data sets can be downloaded over the Internet or accessed on CD for use on personal computers. Labour economists employ a wide variety of data sets, but there are some sources that we will encounter more frequently in this book. A useful way to catalogue these data sets is by the nature of the unit of observation:
1. Aggregate or time-series data report economy-wide measures like the unemployment rate, inflation, GDP, or CPI. The variables are usually reported for several years (i.e., over time) and often for several economies (countries or provinces). A common database of such series is Statistics Canada’s CANSIM, which has information on literally thousands of variables at national, provincial, and city levels.
2. Cross-section microdata report measures of variables (such as earnings, hours of work, and level of education) for individuals at a point in time. Common data sets of this type (of which considerable use is made throughout the text) are the censuses of Canada LO4, 5
LO4, 5
3More thorough discussions of regression analysis and statistics can be found in Gujarati (2005) and Wooldridge (2003), or in any other econometrics textbook.
http://www5.statcan.
gc.ca/cansim/
home-accueil?lang 5 eng
www.bls.census.gov/
cps/cpsmain.htm
ben40208_ch01_001-030.indd 20
ben40208_ch01_001-030.indd 20 9/8/11 10:03 PM9/8/11 10:03 PM
CHAPTER 1: Introduction to Labour Market Economics 21
conducted every five years; the monthly Labour Force Survey (LFS); and, in the United States, the monthly Current Population Survey (CPS).
3. Panel or longitudinal data combine the features of cross-section and time-series data by following sample individuals for several years. This selection allows economists to study the dynamics of individual behaviour, such as transitions into and out of the labour market. In Canada, the Survey of Labour and Income Dynamics (SLID) is a panel data set that follows individuals for a few years, focusing on labour market outcomes. The National Longitudinal Survey of Children and Youth (NLSCY) follows a sample of young Canadians over a longer period of time. In the United States, the Panel Study of Income Dynamics (PSID) and various National Longitudinal Surveys (NLS) have followed individuals over a longer time, and many panel data–based studies are based on these sources.
In order to illustrate the main ideas of regression analysis, we employ a very simple exam-ple that should also provide rudimentary insights into the discussion of earnings determi-nation in later chapters. We explore the relationship between hockey-player performance and player compensation. Specifically, we estimate the relationship between goals scored and player salaries. While the policy relevance of this topic is limited, at least that particu-lar labour market should be familiar to many readers. To keep matters “simple” we also focus only on those players playing forward positions (centre, left wing, and right wing), since salary determination of goal tenders and defencemen will probably be determined by different criteria than those used to evaluate players whose principal function is to score goals.
The underlying theory of player compensation is straightforward. As will be discussed in more detail in the text, individual pay will depend primarily on the economic value of the contribution of one’s labour. In this context, we would expect player pay to depend on the amount of additional revenue that he generates (or is expected to generate) for the team.
While it involves some additional assumptions, it is not unreasonable to imagine that the main contribution a forward player makes to a team is scoring goals, helping the team win games, and, ultimately, generating more revenue for the team. What we seek to estimate here is the additional pay a player can expect to make by scoring another goal.
We do this by comparing the salaries of hockey players who score different numbers of goals. We expect the forces of supply and demand to generate higher salaries for the higher-scoring players. For example, imagine that there are only two types of players:
“stars” who score lots of goals, and “grinders” who serve a primarily defensive or limited offensive role. Presumably, the teams will place a greater value on the higher goal scorers for reasons described above. If we assume that there are two distinct markets for these types of players, then salary determination may be depicted by supply and demand graphs, such as those in Figure 1.A1 . Here we can see that the better players earn more because of higher demand and scarcer supply. What we are trying to estimate is the extra earnings a player could expect if he could acquire the skills necessary to move from the grinder to the star market. Of course, in the real world there is a continuous distribution of hockey talent (goal-scoring availability), so we are estimating the returns to marginal improvements in performance.
As suggested earlier, the first step in such an inquiry is to obtain data on the key vari-ables we believe help explain player salaries. In this case, data on player performance statistics and salary are readily available. We use data on player performance in 2009–10 and their salaries in 2010–11. Data on lifetime performance would probably be better, but these more limited data will be sufficient to illustrate the main ideas of regression analysis.
www.bls.gov/nls
ben40208_ch01_001-030.indd 21
ben40208_ch01_001-030.indd 21 9/8/11 10:03 PM9/8/11 10:03 PM
22 Labour Market Economics
Figure 1.A2 shows a scatter plot of player salaries against the number of goals scored in the previous year. Each point in this scatter plot indicates the combination of goals scored and salary for an individual player. Most of the observations lie below $2,000,000 in annual salary and 30 goals scored. There are a few observations corresponding to high salaries and higher
Salary
Employment
(a) Market for grinders
WG
SG
DG
NG
Salary
Employment
(b) Market for stars
WS
SS
DS NS
Consider the separate mar-kets for two types of players:
grinders (low skill) and stars (high skill). The equilibrium wage for each type of player is determined within each submarket. Because they generate less revenue, the demand for grinders is lower than for stars. Since it requires less skill to be a grinder, we can also assume that the supply is greater at each wage than for high-skilled players. The combina-tion of lower demand and higher supply results in a lower wage for grinders (W G ) than for stars (W S ).
FIGURE 1.A1 Supply and Demand and the Relative Earnings of “Star” versus “Grinder”
Hockey Players
Salary, $US millions, 2010–11
Goals in 2009–10
0 10 20 30 40 50
0 2 4 6 8 10
This is a scatter plot of player salaries in 2010–11 against their goals scored the previous year (the 2009–10 season). Each point corresponds to a specific player, relating the number of goals he scored to his salary.
NOTE: Sample includes only players (forwards) who played at least 20 games in the 2009–10 season. Sample size is 369.
SOURCE: Based on data provided by the NHLPA Web site www.nhlpa.com , accessed January 2011.
FIGURE 1.A2 NHL Player Salaries by Goals Scored in the Previous Year, 2010–11
ben40208_ch01_001-030.indd 22
ben40208_ch01_001-030.indd 22 10/4/11 7:07 PM10/4/11 7:07 PM
CHAPTER 1: Introduction to Labour Market Economics 23
goal scoring. An “eyeball” summary of this graph suggests that players who score more goals also make more money. Regression analysis is little more than a formalization of this data summary exercise.
In Figure 1.A3 , we show a plot of the same data, except that we transform the salaries into
“ln(Salary).”Why? It is common practice in labour economics to use logarithms, and some of the reasons are outlined in Exhibit 1A.1 and later in Chapter 9. For our purposes here, the primary reason is that it is easier to summarize the relationship between log salaries and goals with a straight line than the raw levels of salaries. This is especially true given the range of salaries paid to players (from $500,000 to $10,000,000).
One simple way to more formally summarize the data in Figures 1.A2 and 1.A3 is to array the average salaries by the number of goals scored. Table 1.A1 illustrates the results of this exercise. Here we can see that player salaries rise from $0.95 million to $2.34 mil-lion when “Goals Scored” moves from 0–9 goals per year to 10–19 goals per year, an increase of approximately $1,400,000 for the extra 10 goals. This implies a return of about
$1,400,000/10 5 $140,000 per goal. The return for scoring goals increases as players move to the other categories: average salaries rise by $1.6 million for moving from the 10–19 to the 20–29 goals per year category, by almost $1 million for moving from the 20–29 to the 30–39 category, and by $1.7 million for moving up to the 40–49 goals per year category. It would seem that player salaries decline in moving from the 40–49 category to the 50-plus category.
As can be seen in Figure 1.A2 , however, this is driven by a single “outlier.” The salaries of the two highest goal scorers (Alexander Ovechkin and Sidney Crosby) are identical, at $9 million, and higher than the lower goal scorers. But Steven Stamkos scored a similar 51 goals
Log Salary, $US millions, 2010–11
Goals in 2009–10
0 10 20 30 40 50
13 14 15 16 17
Log Salary Predicted Log Salary
NOTES:
1. Sample includes only players (forwards) who played at least 20 games in the 2009–10 season. Sample size is 369.
2. The predicted log salary is based on the regression results reported in Table 1.A2 , where Log salary 5 13.450 1 0.062 3 Goals.
SOURCE: Based on data prov ided by the NHLPA Web site, www.nhlpa.com , accessed January 2011.
FIGURE 1.A3 Logarithm of NHL Player Salaries by Goals Scored in the Previous Year, and Fitted Regression Line, 2010–11
This is a scatter plot of the logarithm of player salaries in 2010–11 against their goals scored the previous year (the 2009–10 season).
Each point corresponds to a specific player, relating the logarithm of his 2010–11 salary to his goals scored.
In addition, the estimated regression line, based on a regression of log salary on goals scored, is also pre-sented on this graph.
ben40208_ch01_001-030.indd 23
ben40208_ch01_001-030.indd 23 03/10/11 1:34 PM03/10/11 1:34 PM
24 Labour Market Economics
in 2009–10, earning a salary of only $875,000 from the Tampa Bay Lightening in 2010–11.
However, he was only in the second year of his three-year entry-level contract, and the salary for 2010–2011 does not include potential bonuses. In time, we expect his salary to adjust to his level of performance. In terms of logarithms, we see that the log salary rises by about 0.80 for 10 goals scored in the below-30-goals range, and about 0.30-0.40 per 10 goals as players move to the 50-goals-per-season mark.
A regression function calibrates more formally the relationship that was apparent in the table. Rather than letting our eyes fit a line through the data plotted in Figure 1.A2 and Figure 1.A3 , or approximating the relationship between two variables from a table, regres-sions are calculated to provide an estimate of the function that “best fits” the data. Imagine that we want to estimate the following relationship between salary and goals:
Salary 5 a 1 b 3 Goals
This is the equation for a straight line. In this example, salary is the dependent variable, while goals is the independent, or explanatory, variable. If we believed this were an exactly true and complete model of player salaries, all of the data would lie on the line implied by this equa-tion, and estimation would be very simple. Of course, we know that this is only an approxima-tion, and there are many other factors (leadership ability, “marketability,” past performance, long-term contracting issues, and luck) that affect player salaries. We can lump all of these factors together into an error term, e, so that our augmented model is
Salary 5 a 1 b 3 Goals 1 e
For any given line that we can draw through the points in the scatter plots, this “model” will fit perfectly. Individual player salaries will equal the “predicted” salary that is on the line, plus the difference between the actual salary and the predicted salary. However, we are interested in explaining as much of the salary as we can with goals and would like as little as possible left to the residual, e. While there are many ways to formalize this idea, in practice, most researchers choose the line that minimizes the sum of the (squared) residuals, or distance between the line and the actual observation. This ordinary least squares or OLS estimator yields a line that best predicts salary with goals scored, at least according to this criterion. The output of regres-sion analysis will be estimates for the parameters of this line: a, the intercept, and b, the slope.
The estimated regression line for these data is shown in Figure 1.A3 with log salary as the dependent variable. It is upward sloping, though the precise slope is hard to read directly TABLE 1.A1 Tabulation of NHL Player Salaries by Goals Scored, 2010–11
Goals Scored,
2009–10 Sample Size
Salary
($US millions) In (Salary)
0–9 128 0.95 13.63
10–19 133 2.34 14.41
20–29 84 3.97 15.03
30–39 17 4.90 15.31
40–49 4 6.60 15.70
50 1 3 6.29 15.24
NOTES:
1. Each column indicates the sample size or sample average for the corresponding goal-scoring category.
2. Sample includes only players (forwards) who played at least 20 games in 2009–10. Sample size is 369.
SOURCE: Based on data provided by the NHLPA Web site, www.nhlpa.com, accessed January 2011.
ben40208_ch01_001-030.indd 24
ben40208_ch01_001-030.indd 24 9/8/11 10:03 PM9/8/11 10:03 PM
CHAPTER 1: Introduction to Labour Market Economics 25
from the graph. The slope of the line will tell us (on average) how much player salaries rise with goals scored. Specifically, the estimated slope will allow us to conduct the “thought experiment” of how much extra a player would earn if he scored one more goal. The numeri-cal estimates of the regression are reported in Table 1.A2 . We show the estimated results for both levels (dollars) and logarithms. In column (1) for levels, we see that each additional goal is associated with $144,943 of additional earnings. In logarithms, column (1) suggests that each goal is associated with an additional 0.062 log dollars, which can be interpreted as an additional 6.2 percent in salary.
In addition to the coefficient , or parameter estimates, regression analysis and the associ-ated statistical theory yield estimates of the stability, or reliability, of the estimates, at least within the sample. The standard errors of each coefficient provide a measure of the preci-sion with which we are likely to have estimated the true parameter. If the estimation proce-dure were repeated on other samples of similar hockey players, the coefficients would likely vary across samples even if the underlying salary determination were the same, since no two samples are identical (because of e). The standard error is an estimate of the variability (due to sampling error , e) that we would expect for estimates of the coefficient across these samples. Coefficients and standard errors are used together to conduct tests of statistical hypotheses. The most common hypothesis we will encounter is whether a given coefficient is statistically significantly different from zero. Even if one variable has no effect on another, given sampling error it is unlikely that the estimated coefficient would equal zero exactly.
For this reason, we generally pay attention only to coefficients that are at least twice as large as their standard errors. This is related to the common “disclaimer” in the reporting of poll results; that the polling procedure yields estimated percentages that are within a given range of the true percentage 19 times out of 20, so that we should take the specific reported value with a well-defined grain of salt. While this is an approximation of the basic, formal statistics underlying hypothesis testing, it should be sufficient for most empirical research discussed in the book.
TABLE 1.A2 Estimated Effects of Performance on Player Salary in 2010–11 (standard errors in parentheses)
Salary ($US) mean 5 2,422,666
ln (Salary) mean 5 14.34
Means (1) (2) (3) (4) (5) (6)
Intercept 331,796 61,668 31,864 13.450 13.360 13.340
(141,586) (129,445) (137,831) (0.056) (0.052) (0.055)
Goals 14.43 144,943 57,328 58,700 0.062 0.030 0.031
(8,121) (11,574) (11,785) (0.003) (0.005) (0.005)
Assists 20.06 76,462 77,068 0.027 0.028
(7,873) (7,937) (0.003) (0.003)
Plus/minus 0.43 25,006 20.003
(7,903) (0.003)
R-squared 0.47 0.57 0.58 0.51 0.59 0.59
NOTE: Sample includes only players (forwards) who played at least 20 games in 2009–10. Sample size is 369.
SOURCE: Based on data provided by the NHLPA Web site, www.nhlpa.com, accessed January 2011.
ben40208_ch01_001-030.indd 25
ben40208_ch01_001-030.indd 25 9/8/11 10:03 PM9/8/11 10:03 PM
26 Labour Market Economics
In labour economics, especially in the presentation of empirical results, it is common to encounter logarithms. Many students find logarithms (or “logs”) quite intimidat-ing, but, in fact, the use of logs often simplifies economic theory and empirical work.
There are at least two (related) contexts in which you will encounter logs in labour economics.
1. First, logs are helpful in expressing elasticities. The elasticity of labour supply with respect to the wage expresses the effect of a change in the wage on hours of labour supply in proportional or percentage terms. This is usually more convenient than expressing the relationship in the original units of measurement. For example, we could state that a $1 per hour rise in the wage increases annual labour supply by 50 hours. In order to facilitate comparison across studies, however, it is usually preferable to report that a 10 percent increase in the wage leads to a 2.5 percent increase in labour supply; that is, that the elasticity is 0.25. The elasticity is unit free and gives the proportional (or percentage) change in hours associated with a proportional (or per-centage) change in the wage. What does this have to do with logs? The conventional formula for an elasticity is
E 5 DY 3 X 5 DY 4 DX DX Y Y X
where “D” refers to the change in a variable. As it turns out, changes in the loga-rithms of variables directly yield the proportional changes, so that the elasticity can be
where “D” refers to the change in a variable. As it turns out, changes in the loga-rithms of variables directly yield the proportional changes, so that the elasticity can be