Examining if High-Team Payroll Leads to
High-Team Performance in Baseball:
A Statistical Study
Nicholas Lambrianou 13'
B.S. In Mathematics with Minors in English and Economics
Dr. Nickolas Kintos
Thesis AdvisorThesis submitted to:
Honors Program of Saint Peter's University April 2013
Table of Contents
Chapter 1: The Study and its Questions 3
An Introduction to the project, its questions, and a breakdown of the chapters that follow
Chapter 2: The Baseball Statistics 5
An explanation of the baseball statistics used for the study, including what the statistics measure, how they measure what they do, and their strengths and weaknesses
Chapter 3: Statistical Methods and Procedures 16
An introduction to the statistical methods applied to each statistic and an explanation of what the possible results would mean
Chapter 4: Results and the Tampa Bay Rays 22
The results of the study, what they mean against the possibilities and other results, and a short analysis of a team that stood out in the study
Chapter 5: The Continuing Conclusion 39
A continuation of the results, followed by ideas for future study that continue to project or stem from it for future baseball analysis
Appendix 41
Chapter 1: The Study and its Questions
Does high payroll necessarily mean higher performance for all baseball statistics? Major League Baseball (MLB) is a league of different teams in different cities all across the United States, and those locations strongly influence the market of the team and thus the payroll. Year after year, a certain amount of teams, including the usual ones in big markets, choose to spend a great amount on payroll in hopes of improving their team and its player value output, but at times the statistics produced by these teams may not match the difference in payroll with other teams. This observation invites a few questions for investigation.
Are high-payroll teams actually seeing an improvement in results?
Are the results between high-payroll and non-high-payroll teams actually statistically different?
What statistics present the strongest relation with high payroll increase? What statistics present the weakest relation with payroll increase?
The questions and possibilities are endless, so those are just the beginning, but the purpose of this study is to answer the questions raised above and to investigate if high-payroll teams truly perform better, and then interpret what the results actually mean.
To accomplish this, statistical methods will be utilized for the investigation. Chapter two, The Baseball Statistics, will give a brief description of each statistic used in the study, including how it is formed and what exactly it measures. This will include a detailed breakdown of basic and more advanced baseball statistics that are used in the analysis. Chapter three, Statistical
Methods and Procedures, will describe the statistical tests applied to the above statistics using team payroll as either a designation for groups or as a dependent variable for study. The chapter will also discuss what the possible results of each statistical test would imply about what is being tested. Chapter four, Results and the Tampa Bay Rays, will interpret the results of the analysis, and it will also profile the Tampa Bay Rays, a low-payroll team that manages to produce results that are equal or better than those of teams on the high-end of the payroll spectrum. Chapter 5, The Continuing Conclusion, will include concluding thoughts, as well as a discussion for future analysis based on the procedures and results of the study.
Chapter 2: Statistics and Methods
Sabermetrics is the statistical analysis of baseball data. Rather than focusing on the traditional statistics that have been flashed across the bottom of televised baseball games, scoreboards at ballparks, and Sports News Broadcasts for years, the term “Sabermetrics” has come to define the study of more advanced baseball statistics. The goal of Sabermetrics is to try to objectively estimate the worth of a player or a specific part of that player, rather than focusing on data that could be immaterial. This thesis presents both traditional statistics, and the more advanced baseball statistics associated with Sabermetrics. The results of the analysis, which are based on compiled team data, rely more heavily on Sabermetrics-associated statistics.
The baseball statistics that are used in this study mostly fall into two different categories: those drawn from position players and those drawn from pitching. The one exception is Team Winning Percentage, which takes into account contributions from both the categories listed above and is measured by taking wins over wins and losses, or total games played. On the statistics split into the two categories, position players and pitching, there is an overlap between some of the data. Some of the team statistics objectively improve upon others and are thus better measurements. Overall, the purpose of providing this variety is to see how teams might value certain things (or coincidentally simply excel in them), despite those things being less
meaningful. Also, since no one measurement is fool-proof, diversifying all the possible aspects provides for a more accurate overall picture. For position players, the team statistics that were utilized were:
On-Base Percentage (OBP) Slugging Percentage (SLG) Isolated Power (ISO)
Weighted On-Base Average (wOBA) Weighted Runs Created Plus (wRC+) Ultimate Base Running (UBR) Weighted Stolen Base Runs (wSB) Defensive Runs Saved (DRS) Ultimate Zone Rating (UZR)
Wins Above Replacement (WAR) sorted by position players
The first statistic listed above (Runs scored per year) is explained in its name. It is solved by taking the amount of runs that cross home plate over the designated number of years and then dividing that total number of runs scored by that number of years to find an annual average. The next two statistics, OBP and SLG, are slowly becoming more accepted by the majority of the baseball community. The first of these, OBP, is solved by the formula1:
(1)
Please note at this point that all abbreviations or further statistics found in formulas that have formulas of their own may be found in the appendix. This statistic essentially measures how well a batter reaches base (with exceptions like fielding errors or a fielder's choice). As Fangraphs
terms it, OBP measures the “most important thing a batter can do at the plate: not make an out.”2
The formula for the next statistic, SLG, is solved by3:
(2)
The original purpose of SLG was to help measure a player's power output, but, as is apparent in the formula provided, it actually takes singles into account and makes assumptions about what each type of hit is worth against the other type of hits. The criticism against that last point is that, when studied, hits like a double are not worth twice as much as a single, and the rest of the weights do not match up; so, it is not perfect in its attempt exactly. Nevertheless, SLG is commonly added to the OBP to produce the On-Base Plus Slugging (OPS) statistic. OPS provides a quick snapshot of a player's offensive contributions, but as you see the statistic is not listed above. That is because SLG and OBP are typically not seen as equal measures; just as the hits in SLG are not equal (OBP is worth approximately 1.8 times more than SLG). For all those reasons, wOBA, the next statistic, instead of OPS, serves as a statistic that “combines all the aspects of hitting into one metric...more accurately and comprehensively.”4
The statistic notes that all “hits are not created equal,”5
and thus when combining everything into one it weighs them in proportion to the different actual run values. These weights change annually. To provide an example of how it is typically calculated, 2012's formula may be used6:
–
(3)
2Fangraphs, "OBP."
3BaseballReference, "Slugging Percentage." Online.
http://www.baseball-reference.com/bullpen/Slugging_percentage.
4Fangraphs, "wOBA." Online. http://www.fangraphs.com/library/offense/woba/. 5Fangraphs, "wOBA."
The formula immediately notes that, based on the research of Tom Tango – author of The Book, and developer of wOBA and other handy statistics, the weights from SLG do not follow over. For example, the SLG measure had a single worth half as much as a double and one fourth of the value of a home run. In contrast, the research behind wOBA found that a single is worth
approximately 70% of the value of a double and 43% of the value of a home run. Equation (3) - (see above), does not take stolen bases (SB) and “caught stealing” (CS) into account. Instead, it essentially takes everything that comes from the possible value produced strictly from
performance at the plate. Equation (3) can be used to “determine an ideal batting lineup.”7
The modified version of Equation (3) used in this study, however, does take SB and CS into account, weighing SB at about 0.20 to 0.25 and CS numbers at about -0.40 to -0.50 (caught stealing is counted as being more negative than successfully stealing a base, which is counted as being positive – since they do not match up one to one towards a team's run numbers)8. The amount of weight is determined via linear run estimators (or the term most often used – linear weights) which are calculated from analyzed sample data in order to measure how much a team can possibly score as a result of an event. One of the only slight downfalls of using wOBA is that it is not park-adjusted (and league-adjusted), but that is accounted for in the next measurement, wRC+, a statistic based on Bill James' ideas, again created by Tom Tango. The plus in this statistic, which changes it from wRC, only means that it is valued against league average (which is 100), so a wRC+ value of 110 would mean a player (or team) created 10% more runs than the league average and 90 of course would be 10% less. The formula for wRC is as follows9:
7
Fangraphs, "wOBA."
8Klaassen, Matt. BeyondTheBoxscore, “Custom wOBA and Linear Weights Through 2010: Baseball Databank Data
Dump 2.1.” Last Modified January 01, 2011. Online.
http://www.beyondtheboxscore.com/2011/1/4/1912914/custom-woba-and-linear-weights-through-2010-baseball-databank-data.
9Appelman, David. Fangraphs, "wRC and wRAA." Last Modified December 12, 2008.
–
(4)
UBR and wSB are the next two statistics. The former of these, UBR, indicates value added by the players of the team from base-running. This is done via linear weights, like in previously mentioned wOBA, so every base-running event is given a specific weight to compare its contribution with the others.10 Some examples of the base-running events includes in UBR include the following:
• Advancing, not advancing, or making an out on an extra base on a hit
• Getting thrown out going for an XBH (negative event, positive is the not-included XBH)
• Tagging up and advancing on fly ball outs
• Runner on second advancing on a ground ball hit to the left side of the infield
Please note that SB and CS are not included in UBR. They are, however, included in the next statistic, wSB, which estimates the number contributed to a team via those events. It is calculated by comparing stolen base runs created per an opportunity against league average, as shown by the following formula11:
(5)
UBR and wSB can be added together to form “Base Running” (Bsr), which is used in one of the statistics that will be mentioned later on.
10Fangraphs, "UBR." Online. http://www.fangraphs.com/library/offense/ubr/. 11Fangraphs, "wSB." Online. http://www.fangraphs.com/library/offense/wsb/.
One of the biggest uncertainties in baseball statistics is how to accurately measure a player's defensive contributions. Unlike other areas of player contribution, the science on evaluating defense is not as exact, but it still goes beyond the simple visual of viewing a player making defensive plays. Defensive statistics, though, are better when more data is provided, so while numbers may be uncertain when looking at a singular player over a quarter of a season, numbers become more steady for a team over a span of multiple years, like in this study. Two statistics that are commonly used to judge defense are DRS and UZR. DRS is calculated by an organization called The Fielding Bible, run by John Dewan. Numbers for DRS are calculated via film study and computer comparisons and results of DRS, based on percentage points, uses Baseball Info Solutions (BIS) data in calculating results. In DRS if a play is made points are added depending what percentage of league average would also make that play, and likewise, points are also subtracted using the same method.12 Some examples of things looked at in DRS include13:
• Pitchers controlling the run game
• Catchers throwing out base-runners and preventing them from stealing bases
• Fielders handling bunts
• Fielders turning a double play as opposed to only getting a singular out on a play where a double play is a possibility
• The range, and ability of fielders of being able to turn a batted ball into an out UZR, which uses numbers from BIS as well, is another statistic that attempts to estimate defensive value. UZR includes a lot of the things DRS does: a player's range, runs from an
12Fangraphs, "DRS." Online. http://www.fangraphs.com/library/defense/drs/. 13Fangraphs, "DRS."
outfielder's arm, and an infielder's ability to turn a double play.14 It also rates players on the plays they make, where they make said plays, and then credits value, either positive or negative
accordingly.15
The next statistic is one that combines almost all of the formerly mentioned statistics into one, called WAR. There are multiple versions of this statistic but the specific version used for this study based on its comparison with others is the version found on Fangraphs. WAR is an all-in-one statistic that essentially provides a quick snapshot of a player's, or team's, value. To reference back to the previously mentioned statistics, for batting it uses wOBA, then it also adds combination of the base running statistics, UBR and SB and CS (mentioned in wSB), and for fielding uses for UZR. For the players that make up a team, on an individual level, it then adds positional adjusts to help level off what value is actually added. Catchers, SS, 2B, 3B, CF, are all given positive value for the positions they play, while LF, RF, 1B, and DH, easier positions to field and play are given negative adjustments in value.16 It is a complex statistic that goes on from there and takes more into account, and at the end of the day serves a purpose all who analyze baseball try to figure out – the worth of a player.
The next group of team statistics was drawn from pitching. The statistics used were:
Earned Run Average (ERA)
Strikeouts Per Nine Innings Pitched (K/9 or alternatively SO/9) Bases on Balls Per Nine Innings Pitched (BB/9)
Expected Fielding Independent Pitching (xFIP)
14Fangraphs, "UZR." Online. http://www.fangraphs.com/library/defense/uzr/. 15Fangraphs, "UZR."
Skill-Interactive ERA (SIERA) WAR sorted by pitchers
Earned Run Average Minus (ERA-)
Fielding Independent Pitching Minus (FIP-)
Expected Fielding Independent Pitching Minus (xFIP-)
ERA, the first of these, is the very basis of pitching evaluation and probably one of the most widely used pitching statistics. ERA’s simple formula, based on nine innings is17
:
(6)
It essentially tells you how many runs a pitcher has allowed to cross the plate during his stay on the mound (with exceptions like unearned runs). The problem with ERA, and why its statistical evaluation attempt is improved upon in the other statistics, is that it is dependent on a wide amount of events a pitcher cannot control, such as the defense behind a pitcher, or base runners. The two statistics that usually appear with ERA are K/9 (alternatively SO/9) and BB/9. These measurements provide the rates at which a pitcher strikes out18 and walks batters19, respectively, and are solved similarly to ERA on a nine inning scale by:
(7)
17Fangraphs, "ERA." Online. http://www.fangraphs.com/library/pitching/era/.
18Fangraphs, "Strikeouts and Walk Rates." Online. http://www.fangraphs.com/library/pitching/rate-stats/. 19Fangraphs, "Strikeouts and Walk Rates."
(8)
Walks and strikeouts, like home runs, are results that do not include the influence of position players fielding, so sometimes the three together are called the “Three True Outcomes,” which helps lead to the ideas behind more advanced pitching statistics that attempt to capture a pitcher's true abilities. SIERA and xFIP, next listed, are two of those statistics. xFIP, like what follows in SIERA, attempts to value how well a pitcher actually pitched and helps predict future
performance better than ERA. It is calculated by the following20:
(9)
The constant is used to put xFIP on a similar scale to ERA for easier comparison. The xFIP measurement, which is similar to Fielding Independent Pitching (FIP), is based on the idea that pitchers cannot really control what happens to balls in play, thus the things valued in the formula are what was listed before under the “Three True Outcomes.” The difference between xFIP and FIP (which is not listed, but used in WAR pitcher calculations) is that xFIP goes a step further and replaces a “pitcher's home run total with an estimate on how many home runs they should have allowed.” This is done by taking league-average home run to fly ball rates and multiplying it by an individual pitcher's fly ball rate, since home run rates fluctuate over time. Now,
continuing the building blocks of Sabermetrics, SIERA is what follows FIP and xFIP, and it is generally regarded as a better value assessment tool. Its formula is shown in the appendix.
Unlike the previously mentioned xFIP and FIP, SIERA does not place little emphasis on batted balls in play. SIERA also is park-adjusted so it takes into account the different stadiums across MLB (so Citi Field won't be the same as say the New Yankee Stadium). SIERA points out many things, including, but not limited to21:
there should be a focus on strikeouts,
walks in small numbers are not that consequential balls in play are complicated.
On the last of these points, the SIERA measurement takes some observations into account, including22:
ground balls result in hits more than fly balls
fly balls result more in XBH,
a higher ground ball rate corresponds to a higher probability of those ground balls becoming outs
pitchers with a higher fly ball rate give up less home runs (per fly ball) than other pitchers.
The next statistic, WAR (for pitchers) is carried over from the previously mentioned FIP, which is a little less reliable than the just-mentioned xFIP and SIERA. FIP, which was not used in the study in its initial form, is calculated by the following23:
(10)
21Fangraphs, "SIERA." Online. http://www.fangraphs.com/library/pitching/siera/ 22Fangraphs, "SIERA."
Regardless of those problems, it is still used as the focus in pitcher WAR, which because of that calculation is somewhat less reliable than its position player version. The essence of pitcher WAR is that FIP is taken and adjusted over on a run scale that provides the approximate value added by a pitcher24 – a win total (not to be confused with pitcher won/loss records which are valueless in analysis). As for the theory behind WAR, the purpose for both pitchers and previously mentioned position players, it is an attempt at an answer to the following question: “If this player got injured and their team had to replace them with a minor leaguer or someone from their bench, how much value would the team be losing?”25
Those values for specific players were then taken and added to make team totals which were then used for the study.
The last statistics are: ERA-, FIP-, and xFIP-. These statistics are formed by taking the previously mentioned statistics of ERA, FIP, and xFIP, and then putting them on a 100 average scale.26 Unlike wRC+ where a figure above 100, like 108, represented the percentage above average, here it would be reverse since these pitching statistics are better as values are lowered. So 108 would then be 8% below average, while 92 would be 8% above average. These statistics are also both park and league adjusted, so variations in competition and stadium are included and then adjusted for.
24
Fangraphs, "Calculating WAR for Pitchers." Online. http://www.fangraphs.com/library/war/calculating-war-pitchers/.
25Fangraphs, "What is WAR?"
Chapter 3: Statistical Tests and Procedures
The first step to the experiment was to select a number of years to provide an accurate sample for the study. Given how much baseball has changed, and continues to change with variables like television contracts, the Collective Bargaining Agreement (CBA), and league alignment changes, it was decided that 2008-2012 would be the best selection to source data from in a five year regular season sample. The next step, which is the fundamental basis of the study, was to sort the teams by average payroll amounts. To do this, opening day payrolls had to be used. These payrolls (seen in Table 3.1 on following page)27 are not perfect, given that teams add or subtract payroll across the span of a year, but they are the most accurate public data available and provide a more thorough picture of how a team operates, or at the very least plans to operate across a five-year span. Using a full year's payroll data would mean having to weigh the payroll across every day of a season prior to the trade deadline because of the number of changes made across a full season, and then accounting for other smaller things, such as call-ups, which all-in-all is rather impossible. The data for the thirty MLB teams yielded, as expected, a wide range of payrolls. The teams in blue bold on the list are designated “high-payroll” teams for the purpose of the analysis in this thesis. The separation between “high-payroll” and all other teams was chosen based on the large break between the Detroit Tigers and Los Angeles Dodgers. Although this separation is subjective, it was a good position to place the cut-off, since it places just under one-third of the teams in the “high-payroll” category and provides a large enough sample to work with for the analysis in this study.
27Baseball Prospectus, "Cot's Baseball Contracts: A contracts clearinghouse named for Cot Tierney, the NL's
MLB te ams
2008-12 Payroll Avg.
New York Yankees
$208,146,204
Boston Red Sox
$152,463,492
Philadelphia Phillies
$137,504,518
New York Mets
$130,194,289
Los Angeles Angels..
$129,435,173
Chicago Cubs
$128,166,767
Detroit Tigers
$125,438,748
Los Angeles Dodgers
$109,256,949
Chicago White Sox
$109,159,266
San Francisco Giants
$101,008,483
St. Louis Cardinals
$100,655,972
Seattle Mariners
$97,453,054
Atlanta Braves
$93,511,275
Minnesota Twins
$86,712,640
Milwaukee Brewers
$86,653,933
Houston Astros
$84,460,066
Texas Rangers
$82,732,397
Toronto Blue Jays
$82,301,711
Cincinnati Reds
$78,496,106
Colorado Rockies
$78,314,362
Baltimore Orioles
$75,831,066
Arizona Diamondbacks
$69,422,375
Washington Nationals
$68,481,172
Cleveland Indians
$67,372,013
Kansas City Royals
$61,185,554
Oakland Athletics
$57,709,805
Tampa Bay Rays
$57,140,854
Miami Marlins
$53,079,644
San Diego Padres
$51,340,431
Pittsburgh Pirates
$46,086,023
The three main statistical evaluation methods applied to the above team data are: Linear Regression, the Shapiro-Wilk test, and Hypothesis Testing. Linear Regression was used to model a relationship between team payroll (dependent variable, x) and the different statistics for team performance (independent variables, y). For instance, as team payroll goes up, you expect a team's OBP to do the same, but for statistics like ERA, where the lower number represents the better value, a negative correlation would be expected. For reference, the statistics in which a positive correlation is better are:
K/9,
both WAR statistics (for position players and pitchers), Winning Percentage,
Runs per a year,
OBP, SLG, wOBA, wRC+, UBR, wSB, DRS, and UZR. The statistics in which a negative correlation is better are:
ERA, BB/9,
xFIP, SIERA, ERA-, FIP-, and xFIP-.
For the linear regression analysis, data for all teams was used collectively and not designated into high payroll and non-high payroll teams, like in the other statistical tests. The correlation
coefficient (r-value) measures the strength of the linear relationship between team payroll and each of the team statistics. The value of r is such that -1 ≤ r ≤ 1. The following general scale was used to measure the correlation strength28:
Figure 3.1
The r-value in this is found by the following formula, where x is the independent variable (payroll), xmean is the mean of all payroll numbers, y is the dependent variable (a specific baseball statistic) and ymean is the mean of that baseball statistic number for all teams:
2 2 ymean y xmean x ymean y xmean x r (11)
As a further measurement, the r-value may also be squared to solve for the coefficient of determination, r2. This value, converted to a percentage, indicates how much each dependent variable, y (each team statistic) can be explained by the independent variable, x (team payroll)29. On the other hand, the quantity 1-r2 indicatesthe amount that is unexplained by the variation in payroll, and thus cannot be explained by the linear regression model.
The Shapiro-Wilk test, which was tested in R, a “language and environment for statistical computing”, was used to determine if the data associated with high-payroll teams forms a normal distribution. Since there are only seven teams in the high-payroll category, normally distributed data for the high-payroll teams would allow for direct comparison with the other twenty three
non-high-payroll teams via hypothesis testing. In the Shapiro-Wilk test, a p-value of p ≥ 0.1 indicates normally distributed data, while a p-value of p < 0.1 indicates non-normal data30.
Since all the data for the seven high-payroll teams was normal (see results in Chapter 4), it could be directly compared to the corresponding data for the non-high-payroll teams. This was done via hypothesis testing, which determines if the difference for each measurement (between the seven high-payroll and twenty three non-high-payroll teams) is statistically significant. The null hypothesis for this, H0, was “No, the difference is not statistically significant.” On the other
hand, the alternative hypothesis, HA, was “Yes, the difference between high-payroll and
non-high-payroll teams is statistically significant.” For the hypothesis testing, the p-value was found for each statistic by using the T.TEST() command in Microsoft Excel. The significance level used for the hypothesis testing was 0.05, so any p-values ≥ 0.05 would lead to not rejecting the null hypothesis. On the other hand, any p-values < 0.05 would lead to a rejection of the null hypothesis. Since the significance level of 0.05 is arbitrarily chosen, any results close to this threshold can be re-examined by calculating the corresponding effect size. The effect size was calculated by the following31:
(12)
The effect size calculation measures (for each statistic) the difference between the high payroll and non-high payroll groups by essentially calculating how many standard deviations the sample
30Myra L. Samuels, and Jeffrey A. Witmer, Statistics for the Life Sciences, (Pearson, 2012), 139. 31Coe, Robert. "It is the Effect Size, Stupid What effect size is and why it is important." Online.
means of the two groups differ by32. The scale used for these results is as follows:
Figure 3.2
Chapter 4: Results and the Tampa Bay Rays
The results of the first test, the linear regression, are presented in the following four tables. The first two are the r calculations (Tables 4.1 and 4.2). Table 4.1 includes the statistics drawn from position players as well as the one combined statistic (winning percentage). Table 4.2 includes the pitching statistics. The values of r also appear in Tables 4.1 and 4.2 along with the previously mentioned interpretation of the r values (strong negative correlation, moderate negative correlation, etc. – See Figure 3.1). The notable results are shown in bold green in Figure 4.1.
Table 4.1
Table 4.2
Next, Tables 4.3 and 4.4 include the previously mentioned values of r2and 1-r2, which indicate how much of the dependent variables (the individual baseball statistics), are explained by the variation in the independent variable (payroll) as previously mentioned in the preceding chapter.
Table 4.4
The results found from the linear regression may also be presented in the form of a scatter plot. The lines found in these plots are the least-squares lines that show the linear relationship between team payroll (x) and each of the team statistics (y). These plots are shown in Figures 4.1 – 4.21.
Figure 4.2
Figure 4.3
Figure 4.5
Figure 4.6
Figure 4.8
Figure 4.9
Figure 4.11
Figure 4.12
Figure 4.14
Figure 4.15
Figure 4.17
Figure 4.18
Figure 4.20
Figure 4.21
Notice the results of the r values presented a “weak or no correlation” for most of the team statistics (see Tables 4.1 and 4.2). The ones that showed some correlation with high team payroll were: Winning Percentage, Runs per a Year, OBP, wOBA, and wRC+. These statistics presented a “moderate positive correlation,” so as payroll went up so did these statistics to a moderate degree. The latter four (Runs per a Year, OBP, wOBA, and wRC+) are hitting statistics, the most important being the runs per Year number as it provides the total output of a team's offense. The first statistic (Winning Percentage) measures where a team finally stood after all the player
contributions (both pitchers and position players) were combined on the field against other teams in MLB. As a result, its correlation with team payroll is one of the most important measures.
It is also worthy to note that many statistics (WAR for pitchers, ERA-, FIP-, xFIP- , SLG, WAR for position players) did come close (within ~0.09) to either entering the moderate positive or moderation negative correlation zones. For the r2 values, the highest percentage explained by the variation in payroll was 34.93% for OBP, and the highest 1-r2 values were found to be attached to the fielding and base-running statistics, ranging from 97.80% to 99.97%. These very high values of 1 – r2 correspond to team statistics were unexplained by the variation in team payroll.
InTable 4.5 shown below, the results of the Shapiro-Wilk test for the seven teams in the high-payroll category are displayed. Since the p-value for each team statistic is ≥ 0.1, the corresponding data for the seven high-payroll teams is normal. As a result, the team data for the seven high-payroll teams can be directly compared to that of the twenty three non-high-payroll teams via hypothesis testing.
Table 4.6 shows the results of the hypothesis testing for the team statistics involving position players. The hypothesis testing was conducted with a 5% significance level. The team statistics shown in bold green in Table 4.6 had p-values < 0.05, and, as a result, the null
hypothesis was rejected in those cases. In other words the difference between high-payroll and non-high-payroll teams was statistically significant for the team statistics in bold green. The two team statistics shown in bold red in Table 4.6 (SLG and wRC+) had p-values that came very close to the 0.05 threshold. The remaining team statistics had p-values well above 0.05, so the null hypothesis was not rejected for those cases. In other words the difference between high-payroll and non-high-high-payroll teams was not statistically significant for the cases where the p -value was well above 0.05.
Thus, to summarize, the team statistics in Table 4.6 that were below the 0.05 threshold and thus led to a rejection of the null hypothesis were: Winning Percentage, Runs Per a Year, OBP, wOBA, and WAR (pp). Of these, the middle three (Runs Per a Year, OBP and wOBA) are hitting statistics, while the last one (WAR (pp)) is an overall position player value statistic. The first in this group (Winning Percentage) is one of the best measurements for total team
assessment. The statistics that came close to the 0.05 threshold were SLG, and wRC+, which are both hitting statistics.
Table 4.7 shows the results of the hypothesis testing for the team statistics involving pitching. For all pitching team statistics, the p-value ≥ 0.05, so the null hypothesis was not rejected for all cases in Table 4.7. In other words, the difference in pitching between high-payroll and non-high-payroll teams was not statistically significant.
Effect size is the final calculation that was done to delve even deeper into the results. The effect size results are as shown in Tables 4.8 and 4.9 (note that the color coordination was kept from the hypothesis testing results in Tables 4.6 and 4.7 to highlight those team statistics that were close to the 0.05 p-value threshold).
Table 4.8
The observations in the last column of Tables 4.8 and 4.9 are based upon Figure 3.2 (see Chapter 3). An effect size essentially measures how many standard deviations separate the high-payroll and non-high-payroll sample means for each team statistic. In Table 4.8, the team statistics that had a large effect size correspond to the same statistics in Table 4.6 which had a p-value that was below or very close to the 0.05 threshold. These team statistics are Winning Percentage, Runs Per a Year, OBP, SLG, wOBA, wRC+, and WAR (pp). All except Winning Percentage and WAR (pp) are purely offensive statistics. The one exception is ISO which had a p-value of 0.1241 in Table 4.6, but its effect size is close to the 0.8 threshold for the large difference. Note that for UBR, and DRS, negative values were found, meaning that the non-high payroll group actually had a better mean than the high-payroll group (which spends more)
For pitching statistics, as occurred throughout the thesis, only medium differences, at most, were found, and the majority of the pitching statistics presented a small difference. As noted before, most of the pitching statistics, aside from K/9 and WAR for pitchers, are better as they are lowered, so a positive value would note that the non-high payroll teams did better in that statistic. For example, BB/9 presents a positive value, so it is one of these statistics.
Combining all the results of the thesis, high-payroll teams had significantly better hitting statistics, overall position player numbers, and a better team winning percentage (which is the total measurement of a team's performance). However, with that result, there are exclusions. There was no significant difference among base-running, fielding, and pitching numbers (though pitching statistics were closer than the former two). Going into the study, it was expected that high-payroll teams would excel at almost every team statistic, and, while there was a significant difference for some statistics, the majority of the team statistics that were studied did not show that expected difference. The results of this study open up the possibility for future studies. An
example of this would to examine the “why”, and to look at why teams produced their results them with their spending habits. This will be discussed more in the final chapter, The Continuing Conclusion.
For every team statistic there is one low-payroll team that consistently stood out, which is the AL East Tampa Bay Rays. They manage to produce numbers equal or better than every high-payroll team, while having a high-payroll that ranked 27th during the 2008-2012 span among the total 30 MLB teams. On the pitching side they rank: 5th in ERA, 9th in K/9, 6th in BB/9, 9th in xFIP, 7th in SIERA, 13th in pitcher WAR, 2nd in ERA- (three-way tie), 15th in FIP- (three-way tie), and 6th in xFIP-. Their advanced pitching statistics were good, but the thing to note is their ERA and ERA-. ERA- is park and league adjusted, so when their final runs allowed for pitching total was adjusted it tied with three other teams for second among all teams. That's quite a feat for a team 27th in payroll spending. On the position player statistics they rank: 8th in Runs Per a Year, 8th in OBP, 13th in SLG, 7th in ISO (three-way tie), 8th in wOBA, 4th in wRC+, 3rd in UBR, 2nd in wSB, 1st in DRS, 1st in UZR, and 2nd in WAR for position players. The first thing that should be
observed is that they ranked 1st in two categories which are both fielding. That helps explain how they ranked near the top in ERA and ERA-, which is pitching partly dependent on fielders. Their base-running also ranked near the top, 2nd and 3rd, and their adjusted total offense ranked 4th in wRC+. They only managed to score enough to be 8th in runs (which is still well above average), but were able to make up for anything they lost there by being so fundamentally complete. WAR notes this, when combining their hitting, fielding, base-running, and total value, they ranked 2nd in MLB. The statistic not mentioned up to this point is Winning Percentage, where they ranked third with 0.565. A team that likely should be ranked near 27th based on their payroll, put up a total accomplishment number to rank 3rd in all of baseball. Moreover, their team payroll was
roughly only 27% of the annual average of the New York Yankees (the highest payroll team), and approximately 48% lower than the average MLB salary during the 2008 – 2012 time period. How could a non-high-payroll team, or more importantly a small-market team, produce these kinds of numbers while spending so little? On the surface they draft well, extend their players at the right time, target cheap and effective players in free agency, embrace advanced statistics, and just seem to be the best operating franchise in baseball, but there is so much to observe in the way they operate. That, like the reason why the results were achieved in the study, is another case for future study, as will be noted in the next chapter.
Chapter 5: The Continuing Conclusion
The results of the study, like most, are open-ended; they offer more questions than were originally proposed. Approaching it, the expectation was that team statistics would certainly match up with payroll numbers and that designated high-payroll and non-high-payroll teams would produce different results that were almost all statistically significant (in favor of high-payroll teams). The results showed this somewhat, but not to the degree anticipated. The results only invite further study and questions. The study focused on if there was a difference, but it was not a focus of the “why”, which is a totally different question that stems from this thesis. One would assume a big part of the “why” depends on how teams spend their money and then rely on the product. The Mets, and Cubs, for example, were high payroll teams for this study, but they have had troubles with both under-performance and injuries on players they relied on. They directed their resources differently than non-high payroll teams, but faced many of the issues that were dependent on their high-payroll approach. In other words, it is harder for a high-payroll team to anticipate certain results, and then face a situation where a great player gets injured and has to be replaced with a lower-production player. In contrast, non-high-payroll teams perhaps diversify from the start and won't incur value losses as much. The things that typically become the most interesting to study are those that produce unexpected results – things that are odd, and the results in this study are just that.
Some other ideas for future study include but are not limited to: an analysis of the premium between certain statistics, looking at how much a division impacts a team's results, more statistics including batted balls (line drives, ground balls, and fly balls), and a profile of the mentioned Tampa Bay Rays and the Oakland Athletics before them, another Sabermetrics team.
Another similar study may include changing the dependent variable from payroll to a statistic like Winning Percentage to form a different study, and see which has a greater correlation with actual winning. This thesis began as a study that, as the title says, was to examine (expected) high performance versus high payroll, but in the end, while there were results and a conclusion, the conclusion is continual in that it only opens up the door to more interesting future questions and study.
Appendix:
AB=At-Bats
PA=Plate Apperances H= Hits
BB=Bases on Balls (or more commonly called walks) uBB=Unintentional Bases on Balls
iBB=Intentional Bases on Balls HBP=Hit by Pitch SF= Sacrifice Fly 1B=Single 2B=Double 3B=Triple HR=Home Run
lgwSB=(SB * runSB + CS * runCS) / (1B + BB + HBP – IBB)
wOBAscale=1.15
References
1. Fangraphs, "OBP." Online. http://www.fangraphs.com/library/offense/obp/. 2. BaseballReference, "Slugging Percentage." Online. http://www.baseball-reference.com/bullpen/Slugging_percentage.
3. Fangraphs, "wOBA." Online. http://www.fangraphs.com/library/offense/woba/.
4. Klaassen, Matt. BeyondTheBoxscore, “Custom wOBA and Linear Weights Through
2010: Baseball Databank Data Dump 2.1.” Last Modified January 01, 2011. Online. http://www.beyondtheboxscore.com/2011/1/4/1912914/custom-woba-and-linear-weights-through-2010-baseball-databank-data.
5. Appelman, David. Fangraphs, "wRC and wRAA." Last Modified December 12, 2008. Online.www.fangraphs.com/blogs/wrc-and-wraa/.
6. Fangraphs, "UBR." Online. http://www.fangraphs.com/library/offense/ubr/. 7. Fangraphs, "wSB." Online. http://www.fangraphs.com/library/offense/wsb/. 8. Fangraphs, "DRS." Online. http://www.fangraphs.com/library/defense/drs/. 9. Fangraphs, "UZR." Online. http://www.fangraphs.com/library/defense/uzr/. 10. Fangraphs, "What is WAR?" Online. http://www.fangraphs.com/library/misc/war/. 11. Fangraphs, "ERA." Online. http://www.fangraphs.com/library/pitching/era/. 12. Fangraphs, "Strikeouts and Walk Rates." Online.
http://www.fangraphs.com/library/pitching/rate-stats/.
13. Fangraphs, "xFIP." Online. http://www.fangraphs.com/library/pitching/xfip/. 14. Fangraphs, "Calculating WAR for Pitchers." Online.
15. Fangraphs, "ERA-/FIP-/xFIP-." Online. http://www.fangraphs.com/library/pitching/era-fip-xfip/.
16. Baseball Prospectus, "Cot's Baseball Contracts: A contracts clearinghouse named for Cot Tierney, the NL's fifth-leading hitter in 1922." Online.
http://www.baseballprospectus.com/compensation/cots/.
17. Perkowski, Debra A., and Michael Perkowski. Data and Probability Connections. Pearson, 2007.
18. Brase, Charles Henry, and Corrinne Pellillo Brase. Understanding Basic Statistics. Brooks/Cole, 2013.
19. Samuels, Myra L., and Jeffrey A. Witmer. Statistics for the Life Sciences. Pearson, 2012.
20. Coe, Robert. "It is the Effect Size, Stupid What effect size is and why it is important." Online. http://www.leeds.ac.uk/educol/documents/00002182.htm.