Mixed - Advanced Baseball Statistics

Our final model is a mix of the linear and credibility methods. In this method, we still use the credibility equation but with a defined start and end date like the linear method. While testing this method, we still used the same range of value to start and end the combination equation. Shown below are two examples of the mixed method. On the left we have the credibility equation with variance in use from games 45 to 125 and on the right, we have the standard deviation with a start game of 35 and end game of 115.

CHAPTER 7. COMBINATION OF THE TWO MODELS 54

Figure 7.3 shows two examples of the mixed method crossover functions with the same starting and ending games of the linear crossover examples.

7.4 Results

After developing all the methods, we tested them against data from 2015-2018 to see which yielded the lowest average residual. We tested these methods with the regular, non-adapted in-season, as it has the higher residuals, thus more room for improvement. To calculate the average residual used, we calculated the absolute value residual at every game step for every team and averaged all the games for all the teams for all the years tested together. The below table shows the results of each method, with the most optimal start and end dates for those methods that require.

Method Start and End Game Avg. Residual

Linear 35, 115 6.003

Credibility (Var) N/A 5.915

Credibility (Std. Dev.) N/A 5.522

Mixed (Var) 35, 125 5.973

Mixed (Std. Dev.) 35, 125 5.778

Table 7.4 shows the average residual of the different combination methods.

As we can see from the results, the classic credibility equations with standard deviation had the lowest average residual of roughly a 5.5 game residual, a 3.4% percent difference. The mixed standard deviation method follows closely behind with a 5.8 game residual, showing that standard deviation is the better statistic in this application. This result is not surprising from both the previously discussed Z graph and the idea that standard deviation is not squared like variance, thus its units are games, instead of games-squared, and its magnitude is smaller overall. Now that we have a combined model that updates for every game, we may be able to convert our season predicting model to a more granular model that can bet on every game. A granular game-by-game model will give us the ability to find more opportunities to exploit oddsmakers.

Chapter 8

Single Game Betting

We have created a model that outputs a seasonal winning percentage for every team at any given point in the season. Using credibility theory, we were able to optimize data from the preseason and use the new current season depending where in the season we are trying to predict. With this model that changes for every game, we want to granularize our model to predict individual games rather than total season wins. However, in order to do that there are some adjustments we need to make to convert our model from a season winning percentage to a singular game percentage.

8.1 Starting Pitcher, Relief Pitcher, and Lineup

For our full season model, we use various ways of adjusting the roster’s statistics to represent the team. It is important to take into account who is on the roster over the course of the season because eventually everyone on the roster should play. However, when predicting an individual game, we care less about the overall roster and more about the players playing in that game. In baseball, teams announce who the starting pitcher is going to be before the game so that is the information we will want to utilize. That starting pitcher then usually pitches about five to seven innings of the game. Teams rotate through five pitchers that are in their starting rotation for each game, but for single games we want to just look at the starting pitcher for that specific game. To calculate their impact, we utilize the WAR statistic to change the team’s runs allowed, which is used to calculate the winning percentage. We do a very similar conversion for offseason transactions highlighted in the preseason section. In order to convert the winning percentage we would need to scale the percentage to be as if that pitcher played in every game all season. For example, we use the WAR the pitcher has for the appearances he made and scale it up to be as if he had played in all 162 games. We then convert that number from wins into our runs allowed we use for that game. Then, we calculate the hitting statistics like BA, OBP, and SLG for the entire lineup calculate the regression from their statistics. Using this method will give us what we believe to be the most accurate winning percentage for the players the team fields that day.

The last adjustment we need to make is for home field advantage. There are plenty of logical reasons we would want to implement this in our single game model. Teams and players are more comfortable playing in their home ball park where they play 81 of 162 games during the year.

Additionally, when players need to travel for games their play can be affected for the negative because of the toll of transportation and not staying in their homes. We used empirical probability to determine that the home team has one 54% of the time since 1920. We also checked to make sure

CHAPTER 8. SINGLE GAME BETTING 56 that recently, since 2000, the winning percentage is still 54%. So for singular games we can add 4%

for every home team’s winning percentage and decrease the away’s team by 4%.

In document Advanced Baseball Statistics (Page 55-58)