ISyE 2028 Basic Statistical Methods - Fall 2015 Analysis of NFL Team Performance Against the Spread Betting Line

(1)

ISyE 2028 Basic Statistical Methods - Fall 2015

Analysis of NFL Team Performance Against the Spread Betting

Line

Author: Jonathan Pang

Introduction

Sports betting, although controversial, is an extremely popular form of gambling in the United States. Although many different types of bets can be placed on popular websites or in casinos, the most popular type of bet that is placed on the results of major American sports is the spread bet.

A spread bet is a wager that the difference between the scores of two teams will be less than or greater than a certain value, which is specified by the bookmaker. For example, if the spread betting line for the Detroit Lions at Kansas City Chiefs game in Week 8 is +3.5 for the Detroit Lions, a spread bet placed on the Detroit Lion would be won if the score for the Detroit Lions (D) + the spread betting line (+3.5) is greater than the score for the Kansas City Chiefs (K).

D + 3.5 > K

If the score for the Detroit Lions (D) + the spread betting line (+3.5) is less than the score for the Kansas City Chiefs, the wager will be lost.

D + 3.5 < K

In other words, the bettors are wagering on whether or not the Kansas City Chiefs will outscore the Detroit Lions by +3.5 points. It is also noted that most sports scores are only represented by integers. Thus, oftentimes the bookmakers will set the betting line at half a point intervals, making it impossible for neither side to win the bet. Otherwise, if the Kansas City Chiefs outscored the Detroit Lions by exactly the spread betting line, a push would occur and neither side would win (or lose).

In order to maximize the odds of winning a spread bet, historical data can be utilized to determine patterns in the final spread and the actual results in games. In this study, trends will be analyzed to determine whether or not any teams in the NFL have historically consistently “beat the spread”. This will be answered by hypothesis testing, comparing the spread betting lines and score differentials for each game a team plays. If the difference is significant, it is considered a safer bet to wager money on the team. However, it is expected that there is no significant difference between historical spread betting lines and score differentials for each game a team played. In addition, it is expected that historically no team has outperformed or underperformed the spread betting line on a consistent basis.

An aggregate of historical and current sports betting information was found at the Odds Sharks website, on which the final point spreads and over under betting lines are recorded for professional sports such as the NFL (National Football League), MLB (Major League Baseball), NBA (National Basketball Association), and NHL (National Hockey League) as well as college sports such as NCAAF (National Collegiate Athletic Association Football) and NCAAB (National Collegiate Athletic Association Basketball). A web scraper was utilized in order to extract past sports betting lines as well as actual scores of games.

Data Collection

Data was collected from the Odds Sharks website using a CasperJS web scraper and processed by NodeJS. Relevant information was then inserted into a CSV document. Source code is attached in Appendix B. Data was collected for all regular season games played in the NFL since the 1995-1996 season. This resulted in approximately 20 years of data (including the present 2015-2016 season which has not yet been completed).

(2)

DATA PROCESSING

This data was then processed using R to determine the average amount by which each team scored above the betting line as well as how often each team was able to beat the betting line.

Data Processing

The data collected from the Odds Shark website was first analyzed to determine the amount on average that a team overperformed the spread betting line in the past 20 seasons. This information is summarized in Appendix B in Table 1 and shown in graphical form in Figure 1.

As can be seen in Table 1, it is evident that the team that most outperformed the spread betting line on average was the New England Patriots, which scored on average, 2.798761610 points above the spread. Other teams such as the Green Bay Packers and the Pittsburgh Steelers also performed well against the spread betting line. On the other hand, it is evident that the team that on average most underperformed the spread betting line was the Arizona Cardinals, which scored on average, −1.743710692, points below the spread. Likewise, the Cleveland Browns and St. Louis Rams also underperformed against the spread betting line.

A two-tailed t-test with an alpha value of 0.05 was then constructed to determine if any teams in the past 20 years significantly outperformed or underperformed the spread betting lines. The t-tests were performed using R. It was determined that the following teams have historically performed significantly different than the spread betting lines: the Arizona Cardinals, the Cleveland Browns, the Green Bay Packers, and the New England Patriots. Both the Arizona Cardinals and the Cleveland Browns historically underperformed the spread betting lines and the Green Bay Packers and the New England Patriots historically outperformed the spread betting lines. The results of the t-tests are printed below.

ARI One Sample t−t e s t d a t a : team games$pointsATS t = −2.3237 , d f = 3 1 7 , p−v a l u e = 0 . 0 2 0 7 7 a l t e r n a t i v e h y p o t h e s i s : t r u e mean i s n o t e q u a l t o 0 95 p e r c e n t c o n f i d e n c e i n t e r v a l : −3.2200848 −0.2673366 sample e s t i m a t e s : mean o f x −1.743711 CLE One Sample t−t e s t d a t a : team games$pointsATS t = −2.154 , d f = 2 7 7 , p−v a l u e = 0 . 0 3 2 1 a l t e r n a t i v e h y p o t h e s i s : t r u e mean i s n o t e q u a l t o 0 95 p e r c e n t c o n f i d e n c e i n t e r v a l : −2.9534503 −0.1328806 sample e s t i m a t e s : mean o f x −1.543165 GB One Sample t−t e s t d a t a : team games$pointsATS

(3)

CONCLUSION t = 2 . 4 4 5 6 , d f = 3 3 0 , p−v a l u e = 0 . 0 1 4 9 8 a l t e r n a t i v e h y p o t h e s i s : t r u e mean i s n o t e q u a l t o 0 95 p e r c e n t c o n f i d e n c e i n t e r v a l : 0 . 3 5 2 8 6 3 4 3 . 2 5 4 3 8 7 4 sample e s t i m a t e s : mean o f x 1 . 8 0 3 6 2 5 NE One Sample t−t e s t d a t a : team games$pointsATS t = 3 . 4 7 6 1 , d f = 3 2 2 , p−v a l u e = 0 . 0 0 0 5 7 8 5 a l t e r n a t i v e h y p o t h e s i s : t r u e mean i s n o t e q u a l t o 0 95 p e r c e n t c o n f i d e n c e i n t e r v a l : 1 . 2 1 4 7 5 7 4 . 3 8 2 7 6 6 sample e s t i m a t e s : mean o f x 2 . 7 9 8 7 6 2

Thus, the data support that the Arizona Cardinals and Cleveland Browns were historically underper-formed the spread betting line and the Green Bay Packers and New England Patriots historically outper-formed the spread betting line.

Though the previous observations can be interesting to bettors, a more relevant observation would not be how much a team scores above the spread betting line but rather how often a team scores above the spread betting line. Using R, the proportion of games in which a team was able to outperform the spread betting line was calculated and summarized in Table 1 and shown in graphical form in Figure 2.

This time, it is noted that the team that most consistently outperformed the spread betting line was the Carolina Panthers while the New England Patriots again fared well as the second most consistent with the Philadelphia Eagles and Green Bay Packers close behind. On the other end, we find that the St. Louis Rams was the most consistent underperformer while the San Francisco 49ers and the Washington Redskins were marginally better.

Furthermore, in order to determine if any team consistently outperformed or underperformed the spread betting line in the past 20 years, a z-test for proportions was performed on each of the sets of historical data. The two-tailed z-tests were constructed to compare the percentage of times that a team “beat the spread” with the null hypothesis that each team “beat the spread” 50% of the time. This was equivalent to stating that the probability of “beating the spread” was random. An alpha value of 0.05 was utilized and the tests were processed using R. It was determined that no team performed significantly different from the null hypothesis percentage of 50%. Thus, the hypothesis that no team has historically consistently outperformed or underperformed against the spread is supported.

Conclusion

The purpose of this study was to determine if any team significantly and consistently beat the spread betting lines placed on games in casinos and on popular betting websites. The spread betting lines are essentially a prediction of how much one team will beat another team and are moved throughout the week in order to maintain a 50% balance on both sides of the spread (50% betting under the spread and 50% betting over the spread). Since a team has no incentive to “beat the spread” on their own as players and team personnel are prohibited from betting on games, a team that is able to consistently “beat the spread” can be interpreted as a team that is consistently underrated by the bookmakers and by the general public. Likewise, a team that consistently underperforms against the spread betting line can be interpreted as a team that is consistently overrated by the bookmakers and by the general public. From this study, it was determined that four teams on average perform significantly different from the spread betting lines. These

(4)

CONCLUSION

teams are the Arizona Cardinals and the Cleveland Browns, both of which on average underperform against the spread betting line and the Green Bay Packers and the New England Patriots both of which on average outperform against the spread betting line. However, of these four teams, none consistently outperform or underperform against the spread. This would mean that on the occassions that the teams do outperform or underperform the spread, the teams greatly outperform or underperform. However, these teams do not outperform or underperform the spread on a more consistent basis than do any other teams. In fact, the percentage of beating the spread is, for all these teams, very close to 50% which corresponds to random chance. In conclusion, no team was found to have historically outperformed or underperformed significantly against the spread and the proportion of beating the spread for all teams was found to be 50%.

The data collected is not consistent with the expectation that there is no difference between score differ-entials and the spread betting line. The data collected, however, is consistent with the expectation that no team consistently outperformed or underperformed the spread. The sample size utilized was certainly large enough with 325 game results and spread betting lines collected for each team. The sample is representative of the true population and can be utilized by bettors in order to determine whether to bet above or below the spread when looking at certain teams when placing a spread bet.

(5)

APPENDIX A

Appendix A

Table 1: Points Scored Above the Spread for All Teams Team Mean Percent

ARI -1.743710692 0.4779874 CLE -1.543165468 0.4580645 STL -1.312703583 0.4462541 OAK -1.117283951 0.4444444 DET -0.938906752 0.4533762 DAL -0.923870968 0.4580645 TEN -0.848056537 0.4558304 SF -0.838762215 0.4495114 WAS -0.656146179 0.4518272 CIN -0.492354740 0.4709480 NO -0.400974026 0.4837662 BUF -0.392857143 0.5031056 TB -0.374137931 0.4827586 JAC -0.360856269 0.4831804 ATL -0.335889571 0.4938650 HOU -0.239669421 0.5082645 NYG -0.112804878 0.4939024 MIN -0.022865854 0.5030488 MIA 0.007692308 0.4676923 CHI 0.253105590 0.4689441 SD 0.285493827 0.4845679 NYJ 0.328703704 0.4938272 DEN 0.387537994 0.4893617 PHI 0.419558360 0.5236593 IND 0.435975610 0.5030488 SEA 0.641033435 0.4924012 CAR 0.847692308 0.5292308 BAL 1.183121019 0.4904459 KC 1.290123457 0.5123457 PIT 1.307692308 0.5032051 GB 1.803625378 0.5226586 NE 2.798761610 0.5263158

(6)

APPENDIX A

Figure 1: Mean Points Above Spread for Teams

(7)

APPENDIX B

Appendix B

main.js

v a r path = r e q u i r e ( ’ path ’ ) ; v a r c h i l d P r o c e s s = r e q u i r e ( ’ c h i l d p r o c e s s ’ ) ; v a r c h e e r i o = r e q u i r e ( ’ c h e e r i o ’ ) ; v a r f s = r e q u i r e ( ’ f s ’ ) ; v a r c s v W r i t e r = r e q u i r e ( ’ csv−w r i t e −stream ’ ) ; v a r s e p a r a t o r = ’ , ’ ; v a r n e w l i n e = ’ \ n ’ ; f s . w r i t e F i l e ( ’ o u t p u t . csv ’ , ” d a t e ” + s e p a r a t o r + ”away” + s e p a r a t o r + ” a w a y s c o r e ” + s e p a r a t o r + ”home” + s e p a r a t o r + ” h o m e s c o r e ” + s e p a r a t o r + ” r e s u l t ” + s e p a r a t o r + ” h om e s p re a d ” + s e p a r a t o r + ” a t s ” + s e p a r a t o r + ” t o t a l ” + s e p a r a t o r + ” ou ” + n e w l i n e , f u n c t i o n ( e r r , d a t a ) { } ) ; v a r teams = [ ” A r i z o n a ” , ” A t l a n t a ” , ” B a l t i m o r e ” , ” B u f f a l o ” , ” C a r o l i n a ” , ” Chicago ” , ” C i n c i n n a t i ” , ” C l e v e l a n d ” , ” D a l l a s ” , ” Denver ” , ” D e t r o i t ” , ” Green ” , ” Houston ” , ” I n d i a n a p o l i s ” , ” J a c k s o n v i l l e ” , ” Kansas ” , ”Miami ” , ” Minnesota ” , ” G i a n t s ” , ” J e t s ” , ” England ” , ” O r l e a n s ” , ” Oakland ” , ” P h i l a d e l p h i a ” , ” P i t t s b u r g h ” , ” Diego ” , ” F r a n c i s c o ” , ” S e a t t l e ” ,

(8)

main.js APPENDIX B ” L o u i s ” , ”Tampa” , ” T e n n e s s e e ” , ” Washington ” ] ; t e a m i n d e x = 1 ; w h i l e ( t e a m i n d e x <= 3 2 ) { f o r ( i = 1 ; i < teams . l e n g t h ; i ++) { c o n s o l e . l o g ( i ) c o n s o l e . l o g ( teams [ 0 ] ) ; c o n s o l e . l o g ( teams [ i ] ) ; v a r r e s u l t = c h i l d P r o c e s s . e x e c S y n c ( ’ c a s p e r j s t e s t . j s ’ + teams [ 0 ] + ” ” + teams [ i ] ) ; v a r $ = c h e e r i o . l o a d ( r e s u l t ) ;

v a r date , away , a w a y s c o r e , home , h o m e s c o r e , r e s u l t , home spread , a t s , t o t a l , ou ; v a r j s o n = { d a t e : ” ” , away : ” ” , a w a y s c o r e : ” ” , home : ” ” , h o m e s c o r e : ” ” , r e s u l t : ” ” , h o m e s pr e a d : ” ” , a t s : ” ” , t o t a l : ” ” , ou : ” ” } ; v a r n o e n t r i e s = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody ” ) . f i n d ( ’ t r ’ ) . l e n g t h ; v a r n o c o l u m n s = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody t r ” ) . eq ( 1 ) . f i n d ( ’ td ’ ) . l e n g t h ; f o r ( j = 0 ; j < n o e n t r i e s ; j ++){ d a t e = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 0 ) . t e x t ( ) ; away = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 )

tbody t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 1 ) . t e x t ( ) ; a w a y s c o r e = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e

( 2 ) tbody t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 2 ) . t e x t ( ) ; home = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 )

tbody t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 3 ) . t e x t ( ) ; h o m e s c o r e = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 4 ) . t e x t ( ) ; r e s u l t = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 5 ) . t e x t ( ) ; h om e s pr e a d = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 6 ) . t e x t ( ) ; a t s = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody

t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 7 ) . t e x t ( ) ;

t o t a l = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 8 ) . t e x t ( ) ; ou = $(”# b l o c k −system−main t a b l e : nth−o f −t y p e ( 2 ) tbody

t r ” ) . eq ( j ) . c h i l d r e n ( ’ td ’ ) . eq ( 9 ) . t e x t ( ) ; f s . a p p e n d F i l e ( ’ o u t p u t . csv ’ , ’ ” ’ + d a t e + ’ ” ’ +

s e p a r a t o r +

(9)

main.js APPENDIX B a w a y s c o r e + s e p a r a t o r + home + s e p a r a t o r + h o m e s c o r e + s e p a r a t o r + r e s u l t + s e p a r a t o r + h om e s pr e a d + s e p a r a t o r + a t s + s e p a r a t o r + t o t a l + s e p a r a t o r + ou + n e w l i n e , f u n c t i o n ( e r r ) { c o n s o l e . l o g ( ’ F i l e s u c c e s s f u l l y w r i t t e n ! ’ ) } ) ; } ; } ; v a r f i r s t = teams . s h i f t ( ) ; teams . push ( f i r s t ) ; t e a m i n d e x = t e a m i n d e x + 1 ; } ;

(10)

test.js APPENDIX B

test.js

v a r c a s p e r = r e q u i r e ( ’ c a s p e r ’ ) . c r e a t e ( { v e r b o s e : t r u e , l o g L e v e l : ” debug ” } ) ; c a s p e r . s t a r t ( ” h t t p : / /www. o d d s s h a r k . com/ n f l / d a t a b a s e ” ) ; c a s p e r . t h e n E v a l u a t e ( f u n c t i o n ( ) { // Number o f Games

document . getElementById ( ’ games−30−h2h ’ ) . c h e c k e d = ’ checked ’ ; document . getElementById ( ’ l o c a t i o n −home−h2h ’ ) . c h e c k e d = ’ checked ’ ; // Month

document . getElementById ( ’ c h a l k −s e l e c t −month−h2h ’ ) . getElementsByTagName ( ’ o p t i o n ’ ) [ 0 ] . s e l e c t e d = ’ s e l e c t e d ’ ;

// R e g u l a r S e a s o n o r P o s t s e a s o n

document . getElementById ( ’ c h a l k −s e l e c t −game−type−h2h ’ ) . getElementsByTagName ( ’ o p t i o n ’ ) [ 1 ] . s e l e c t e d = ’ s e l e c t e d ’ ;

// F a v o r i t e o r Underdog

document . getElementById ( ’ c h a l k −s e l e c t −odds−h2h ’ ) . getElementsByTagName ( ’ o p t i o n ’ ) [ 4 ] . s e l e c t e d = ’ s e l e c t e d ’ ;

// L i n e Range

document . getElementById ( ’ t x t −min−h2h ’ ) . v a l u e = ’ − 1 0 0 ’ ; document . getElementById ( ’ t x t −max−h2h ’ ) . v a l u e = ’ 1 0 0 ’ ; } ) ;

c a s p e r . t h e n ( f u n c t i o n ( ) { // Team Name

t h i s . sendKeys ( ’ i n p u t#team−s e a r c h −h2h ’ , c a s p e r . c l i . g e t ( 0 ) , { keepFocu s : t r u e } ) ;

c a s p e r . page . sendEvent ( ” k e y p r e s s ” , c a s p e r . page . e v e n t . key . Down) ; t h i s . c l i c k ( ’ u l . u i −a u t o c o m p l e t e l i . u i −menu−i t e m : f i r s t −c h i l d a ’ ) ; // Opponent Name

t h i s . sendKeys ( ’ i n p u t#opponent−s e a r c h −h2h ’ , c a s p e r . c l i . g e t ( 1 ) , { keepFocus : t r u e } ) ;

c a s p e r . page . sendEvent ( ” k e y p r e s s ” , c a s p e r . page . e v e n t . key . Down) ; c a s p e r . page . sendEvent ( ” k e y p r e s s ” , c a s p e r . page . e v e n t . key . Down) ; c a s p e r . page . sendEvent ( ” k e y p r e s s ” , c a s p e r . page . e v e n t . key . E n t e r ) ; } ) ; c a s p e r . t h e n ( f u n c t i o n ( ) { // C l i c k Submit t h i s . c l i c k ( ’# submit−h2h ’ ) ; } ) ; c a s p e r . t h e n ( f u n c t i o n ( ) { c o n s o l e . l o g ( c a s p e r . g e t P a g e C o n t e n t ( ) ) ; } ) ; c a s p e r . run ( ) ;

(11)

plots.R APPENDIX B

plots.R

l i b r a r y ( g g p l o t 2 ) ### Read CSV n f l d a t a <− r e a d . c s v ( f i l e = ’ . . / d a t a c o l l e c t i o n / o u t p u t . csv ’ , s e p = ’ , ’ , h e a d e r = TRUE)

### S e t Date a s Date Type

n f l d a t a $ d a t e <− a s . Date ( n f l d a t a $ d a t e , ”%b %d , %Y” ) ### Take 20 Year S u b s e t o f Data

n f l d a t a <− s u b s e t ( n f l d a t a , s u b s e t = n f l d a t a $ d a t e >= ’1995 −07 −01 ’) ### P r o c e s s Home Data Frame t o I n c l u d e Only R e l e v a n t F i e l d s ###

n f l h o m e d f <− d a t a . frame ( home = n f l d a t a $ h o m e , pointsATS =

n f l d a t a $ h o m e s c o r e + n f l d a t a $ h o m e s p r e a d − n f l d a t a $ a w a y s c o r e , a t s = n f l d a t a $ a t s )

### S e t ATS a s B i n a r y

n f l h o m e d f $ a t s <− i f e l s e ( n f l h o m e d f $ a t s == ”W” , 1 , 0 ) ### A g g r e g a t e Data Frame A c c o r d i n g t o Means

n f l h o m e m e a n d f <− setNames ( a g g r e g a t e ( n f l h o m e d f $ p o i n t s A T S , by = l i s t ( n f l h o m e d f $ h o m e ) , FUN = mean ) , c ( ” team ” , ”mean ” ) )

### Make Bar Graph o f Means

g g p l o t ( d a t a = n f l h o m e m e a n d f , a e s ( x = r e o r d e r ( team , −mean ) , y = mean ) ) + geom bar ( s t a t = ” i d e n t i t y ” ) +

x l a b ( ”Home Team” ) +

y l a b ( ” Mean P o i n t s ATS” ) +

#g g t i t l e ( ” Mean P o i n t s Above Spread f o r Home Teams ” ) + c o o r d f l i p ( )

g g s a v e ( ’ i m a g e s / home team mean . png ’ )

### A g g r e g a t e Data Frame A c c o r d i n g t o P e r c e n t s

n f l h o m e p e r c e n t d f <− setNames ( a g g r e g a t e ( n f l h o m e d f $ a t s , by = l i s t ( n f l h o m e d f $ h o m e ) , FUN = mean ) , c ( ” team ” , ” p e r c e n t ” ) )

### Make Bar Graph o f P e r c e n t s

g g p l o t ( d a t a = n f l h o m e p e r c e n t d f , a e s ( x = r e o r d e r ( team , −p e r c e n t ) , y = p e r c e n t ) ) + geom bar ( s t a t = ” i d e n t i t y ” ) + x l a b ( ”Home Team” ) + y l a b ( ” P e r c e n t ATS” ) + y l i m ( 0 , 1 ) +

#g g t i t l e ( ” P e r c e n t Above Spread f o r Home Teams ” ) + c o o r d f l i p ( )

g g s a v e ( ’ i m a g e s / h o m e t e a m p e r c e n t . png ’ )

### P r o c e s s Away Data Frame t o I n c l u d e Only R e l e v a n t F i e l d s ### n f l a w a y d f <− d a t a . frame ( away = n f l d a t a $ a w a y , pointsATS =

n f l d a t a $ a w a y s c o r e − n f l d a t a $ h o m e s p r e a d − n f l d a t a $ h o m e s c o r e , a t s = n f l d a t a $ a t s )

(12)

plots.R APPENDIX B

n f l a w a y d f $ a t s <− i f e l s e ( n f l a w a y d f $ a t s == ”L ” , 1 , 0 ) ### A g g r e g a t e Data Frame A c c o r d i n g t o Means

n f l a w a y m e a n d f <− setNames ( a g g r e g a t e ( n f l a w a y d f $ p o i n t s A T S , by = l i s t ( n f l a w a y d f $ a w a y ) , FUN = mean ) , c ( ” team ” , ”mean ” ) )

### Make Bar Graph o f Means

g g p l o t ( d a t a = n f l a w a y m e a n d f , a e s ( x = r e o r d e r ( team , −mean ) , y = mean ) ) + geom bar ( s t a t = ” i d e n t i t y ” ) +

x l a b ( ”Away Team” ) +

#g g t i t l e ( ” Mean P o i n t s Above Spread f o r Away Teams ” ) + c o o r d f l i p ( )

g g s a v e ( ’ i m a g e s / away team mean . png ’ )

### A g g r e g a t e Data Frame A c c o r d i n g t o P e r c e n t s

n f l a w a y p e r c e n t d f <− setNames ( a g g r e g a t e ( n f l a w a y d f $ a t s , by = l i s t ( n f l a w a y d f $ a w a y ) , FUN = mean ) , c ( ” team ” , ” p e r c e n t ” ) )

g g p l o t ( d a t a = n f l a w a y p e r c e n t d f , a e s ( x = r e o r d e r ( team , −p e r c e n t ) , y = p e r c e n t ) ) + geom bar ( s t a t = ” i d e n t i t y ” ) + x l a b ( ”Away Team” ) + y l a b ( ” P e r c e n t ATS” ) + y l i m ( 0 , 1 ) +

#g g t i t l e ( ” P e r c e n t Above Spread f o r Away Teams ” ) + c o o r d f l i p ( )

g g s a v e ( ’ i m a g e s / a w a y t e a m p e r c e n t . png ’ )

### P r o c e s s Home and Away Data Frames f o r Means

n f l h o m e s u m m e a n d f <− setNames ( a g g r e g a t e ( n f l h o m e d f $ p o i n t s A T S , by = l i s t ( n f l h o m e d f $ h o m e ) , FUN = sum ) , c ( ” team ” , ”sum ” ) )

n f l a w a y s u m m e a n d f <− setNames ( a g g r e g a t e ( n f l a w a y d f $ p o i n t s A T S , by = l i s t ( n f l a w a y d f $ a w a y ) , FUN = sum ) , c ( ” team ” , ”sum ” ) )

n f l h o m e l e n g t h d f <− setNames ( a g g r e g a t e ( n f l h o m e d f $ p o i n t s A T S , by = l i s t ( n f l h o m e d f $ h o m e ) , FUN = l e n g t h ) , c ( ” team ” , ” l e n g t h ” ) )

n f l a w a y l e n g t h d f <− setNames ( a g g r e g a t e ( n f l a w a y d f $ p o i n t s A T S , by = l i s t ( n f l a w a y d f $ a w a y ) , FUN = l e n g t h ) , c ( ” team ” , ” l e n g t h ” ) )

n f l t o t a l m e a n d f <− d a t a . frame ( team = n f l h o m e s u m m e a n d f $ t e a m , mean = ( n f l h o m e s u m m e a n d f $ s u m + n f l a w a y s u m m e a n d f $ s u m ) / (

n f l h o m e l e n g t h d f $ l e n g t h + n f l a w a y l e n g t h d f $ l e n g t h ) )

n f l t o t a l m e a n d f <− n f l t o t a l m e a n d f [ o r d e r ( n f l t o t a l m e a n d f $ m e a n ) , ] ### Make Bar Graph o f Means

g g p l o t ( d a t a = n f l t o t a l m e a n d f , a e s ( x = r e o r d e r ( team , −mean ) , y = mean ) ) + geom bar ( s t a t = ” i d e n t i t y ” ) +

x l a b ( ”Team” ) +

#g g t i t l e ( ” Mean P o i n t s Above Spread f o r Teams ” ) + c o o r d f l i p ( )

(13)

plots.R APPENDIX B

g g s a v e ( ’ i m a g e s / team mean . png ’ )

### P r o c e s s Home and Away Data Frames f o r P e r c e n t s

n f l h o m e s u m p e r c e n t d f <− setNames ( a g g r e g a t e ( n f l h o m e d f $ a t s , by = l i s t ( n f l h o m e d f $ h o m e ) , FUN = sum ) , c ( ” team ” , ”sum ” ) )

n f l a w a y s u m p e r c e n t d f <− setNames ( a g g r e g a t e ( n f l a w a y d f $ a t s , by = l i s t ( n f l a w a y d f $ a w a y ) , FUN = sum ) , c ( ” team ” , ”sum ” ) )

n f l t o t a l p e r c e n t d f <− d a t a . frame ( team = n f l h o m e s u m p e r c e n t d f $ t e a m , p e r c e n t = ( n f l h o m e s u m p e r c e n t d f $ s u m + n f l a w a y s u m p e r c e n t d f $ s u m ) / ( n f l h o m e l e n g t h d f $ l e n g t h + n f l a w a y l e n g t h d f $ l e n g t h ) )

n f l t o t a l p e r c e n t d f <− n f l t o t a l p e r c e n t d f [ o r d e r ( n f l t o t a l p e r c e n t d f $ p e r c e n t ) , ]

g g p l o t ( d a t a = n f l t o t a l p e r c e n t d f , a e s ( x = r e o r d e r ( team , −p e r c e n t ) , y = p e r c e n t ) ) + geom bar ( s t a t = ” i d e n t i t y ” ) + x l a b ( ”Team” ) + y l a b ( ” P e r c e n t ATS” ) + y l i m ( 0 , 1 ) +

#g g t i t l e ( ” P e r c e n t Above Spread f o r Teams ” ) + c o o r d f l i p ( )

(14)

t-test.R APPENDIX B

t-test.R

### Read CSV

n f l d a t a <− r e a d . c s v ( f i l e = ’ . . / d a t a c o l l e c t i o n / o u t p u t . csv ’ , s e p = ’ , ’ , h e a d e r = TRUE)

### S e t Date a s Date Type

n f l d a t a $ d a t e <− a s . Date ( n f l d a t a $ d a t e , ”%b %d , %Y” ) ### Take 20 Year S u b s e t o f Data

n f l d a t a <− s u b s e t ( n f l d a t a , s u b s e t = n f l d a t a $ d a t e >= ’1995 −07 −01 ’) ### P r o c e s s Home Data Frame t o I n c l u d e Only R e l e v a n t F i e l d s ###

n f l h o m e d f <− d a t a . frame ( team = n f l d a t a $ h o m e , pointsATS =

n f l d a t a $ h o m e s c o r e + n f l d a t a $ h o m e s p r e a d − n f l d a t a $ a w a y s c o r e , a t s = n f l d a t a $ a t s )

### S e t ATS a s B i n a r y

n f l h o m e d f $ a t s <− i f e l s e ( n f l h o m e d f $ a t s == ”W” , 1 , 0 )

### P r o c e s s Away Data Frame t o I n c l u d e Only R e l e v a n t F i e l d s ### n f l a w a y d f <− d a t a . frame ( team = n f l d a t a $ a w a y , pointsATS =

n f l d a t a $ a w a y s c o r e − n f l d a t a $ h o m e s p r e a d − n f l d a t a $ h o m e s c o r e , a t s = n f l d a t a $ a t s ) ### S e t ATS a s B i n a r y n f l a w a y d f $ a t s <− i f e l s e ( n f l a w a y d f $ a t s == ”L ” , 1 , 0 ) ### L i s t o f Teams teams <− d a t a . frame ( u n i q u e ( n f l d a t a $ h o m e ) ) ; ### T−t e s t s f o r Mean P o i n t s Above t h e Spread

f o r ( i i n s e q ( 1 , nrow ( teams ) , by = 1 ) ) {

h o m e s u b s e t <− s u b s e t ( n f l h o m e d f , s u b s e t = n f l h o m e d f $ t e a m == teams [ i , 1 ] ) a w a y s u b s e t <− s u b s e t ( n f l a w a y d f , s u b s e t = n f l a w a y d f $ t e a m == teams [ i , 1 ] ) team games <− r b i n d ( h o m e s u b s e t , a w a y s u b s e t )

t e s t <− t . t e s t ( x = team games$pointsATS , mu = 0 , a l t e r n a t i v e = c ( ” two . s i d e d ” ) , c o n f . l e v e l = 0 . 9 5 ) i f ( t e s t $ p . v a l u e < 0 . 0 5 ) { p r i n t ( teams [ i , 1 ] ) p r i n t ( t e s t ) } } ### Z T e s t f o r P r o p o r t i o n o f Time B e a t i n g t h e Spread n f l h o m e l e n g t h d f <− setNames ( a g g r e g a t e ( n f l h o m e d f $ p o i n t s A T S , by = l i s t ( n f l h o m e d f $ h o m e ) , FUN = l e n g t h ) , c ( ” team ” , ” l e n g t h ” ) ) n f l a w a y l e n g t h d f <− setNames ( a g g r e g a t e ( n f l a w a y d f $ p o i n t s A T S , by = l i s t ( n f l a w a y d f $ a w a y ) , FUN = l e n g t h ) , c ( ” team ” , ” l e n g t h ” ) ) n f l h o m e s u m p e r c e n t d f <− setNames ( a g g r e g a t e ( n f l h o m e d f $ a t s , by = l i s t ( n f l h o m e d f $ h o m e ) , FUN = sum ) , c ( ” team ” , ”sum ” ) )

n f l a w a y s u m p e r c e n t d f <− setNames ( a g g r e g a t e ( n f l a w a y d f $ a t s , by = l i s t ( n f l a w a y d f $ a w a y ) , FUN = sum ) , c ( ” team ” , ”sum ” ) )

(15)

t-test.R APPENDIX B f o r ( i i n s e q ( 1 , nrow ( teams ) , by = 1 ) ) { t e s t <− prop . t e s t ( x = n f l h o m e s u m p e r c e n t d f $ s u m [ i ] + n f l a w a y s u m p e r c e n t d f $ s u m [ i ] , n = n f l h o m e l e n g t h d f $ l e n g t h [ i ] + n f l a w a y l e n g t h d f $ l e n g t h [ i ] , p = r e p ( 0 . 5 , t i m e s = 3 2 ) [ i ] , a l t e r n a t i v e = ” two . s i d e d ” , c o n f . l e v e l = 0 . 9 5 ) i f ( t e s t $ p . v a l u e > 0 . 0 5 ) { p r i n t ( teams [ i , 1 ] ) p r i n t ( t e s t ) } }