NBA Playoff Prediction Model
Vijay Menon
April 22, 2015
1
Abstract
At the behest of the Sacramento Kings front office, we construct statistical mod-els for NBA playoff prediction. Predictors include available and well-tracked statistics for rebounding, passing, shooting, and defending, among others. We select the most explanatory covariates via a novel approach to variable selection. We identify and quantify the impact of the most predictive global metrics for playoff qualification, while also accounting for team-level differences in response to certain statistics. Finally, we build in conference and division level depen-dency through a pseudo-likelihood approach. Our best model is more than 95 percent accurate in classification of NBA playoff teams over the last decade.
2
Introduction
In this paper, we use widely available team statistics to predict NBA team playoff probabilities. More than the predictive aspect, we are interested in prescriptive analytics – finding the drivers of playoff qualification so that teams can pull the right levers to find success. This work is being done in collaboration with the Sacramento Kings general management and analytics division.
Pioneered by visionary GMs such as Daryl Morey of the Houston Rockets and Sam Hinkie of the Philadelphia 76ers, the field of analytics has blossomed across the NBA in the last decade. As of 2014, all 30 NBA teams had set up stand-alone analytics divisions to help them gain the competitive edge necessary to win games in a league of increasing parity. Despite these advances, however, basketball analytics remains a nascent industry in terms of the sophistication of techniques currently used and applied.
With the recent transition to a new ownership group led by Vivek Ranadive, and the hiring of a legend in the basketball analytics community – Dean Oliver – the Sacramento Kings hope to return to the forefront of basketball success by becoming a leader in harnessing the power of analytics (Fig. 1). Though I am working on several projects with the team, my goal in this specific project is to help the Kings identify the levers they should be pulling in order to optimize their chances of returning to the playoffs – and, more generally, to identify the statistics that best predict playoff qualification rates across the NBA.
There is good reason to want to explore this topic particularly in depth. Currently, the Kings level of complexity in approaching this topic is to calculate league-wide averages across many different statistics and attempt to “beat the
average.” Such an approach is quite elementary, and perhaps not best suited to the team’s particular style of play. We propose to improve on this approach.
There have been other attempts to tackle this problem, or similar ones. Most notably, John Hollinger of ESPN Insider has published team playoff odds for the last several years (Hollinger, 2014). More recently, Nate Silver’s FiveThir-tyEight blog has published Elo ratings for NFL team playoff odds (Silver, 2014). Distinctly academic approaches have also been attempted at playoff prediction, with limited explanatory success (Yang, 2012).
These approaches are distinct insofar as they are based on in-season, ever shifting effects whereas our approach depends not upon nightly wins or losses, but rather upon overall seasonal team performance. Hence, it is less likely to be affected by unlucky play or by quirky results (for instance, the Cavaliers losing 2 of their first 3 games this season). In other words, the scope of this project goes beyond simple in-season playoff prediction; rather, we intend to illuminate some of the overall drivers of playoff qualification throughout the years.
Basketball data is extensive and widely documented on a variety of sites. This entails a double-edged sword, as obtaining data is less problematic, but the ways in which we can discover interpretable meaning within the data be-come increasingly complex. In this case, I use basketball-reference.com as the definitive source for NBA record-keeping and statistics.
Though we do not expect this analysis to be the definitive work on this subject, we do intend for it to be illustrative and informative through the ap-plication of a variety of methodological approaches including logistic regression and mixed random effects modeling. Certainly, it will serve as an upgrade over the current status quo of averaging and should provide a useful reference for teams interested in optimizing their playoff odds.
3
Literature Review
As the field is relatively nascent, there is not extensive literature on the subject of NBA playoff prediction. Nonetheless, there have been some attempts to broach the subject via statistical analyses. These approaches have in large part been based upon in-season prediction based on binary win-loss outcomes rather than on team performance as measured through statistics.
One approach frequently used in calculating in-season playoff odds is the Elo rating system. The system was originally designed to measure performance of chess players. In this context, each player was assigned an Elo rating that predicted his odds of winning against an opponent with a different Elo rating. These ratings varied depending upon the actual outcome of the match.
In the basketball context, teams are assigned Elo ratings that begin to shift up or down depending upon how they perform against other teams over the course of the season. In this sense, the system is self-correcting. The Elo rating has gained popularity in recent years due to its application by Nate Silver into both MLB and NFL playoff playoff prediction (though not yet NBA prediction). Similarly, forecasters often apply the Bradley-Terry-Luce model as a method for pairwise comparison (Hu, 2004). This model compares teams and assigns a more probable winner or loser based upon each game on a team’s schedule that updates as more information is collected over the course of the season. Though it is unclear precisely what approach Hollinger uses in his playoff odds
Figure 1: The Kings signaled their commitment to analytics by crowdsourcing data for the 2014 NBA draft
calculations, it is clear that at a high-level, he follows similar procedures to those that are described above.
There is nothing inherently wrong with these approaches. However, they are geared primarily for consumer audiences and not for general management. Because these models are shaped so heavily by wins and losses, rather than by the proxies that determine them, they are not particularly useful in helping teams to identify what they specifically need to effect in order to become more successful in boosting their odds of playoff qualification.
In this project, we intend to move beyond the existing literature in order to identify novel approaches to quantifying playoff odds that are more well-suited for helping teams improve. Our modeling will specifically emphasize seasonal level performance and the levers that can be pulled to enhance the likelihood of success within the framework of the NBA.
4
Data
Our dataset extends over the previous eleven seasons, beginning with the 2003-2004 season. Though we have access to data before this time, there are a couple of prima facie reasons why it makes sense to limit our analysis in this way.
First, moving significantly prior to the previous decade and into the 1990s might complicate our results because of rule and stylistic changes between dif-ferent eras of basketball. The modern game has not changed much between now and 2003. However, basketball in the 1990s was quite different. Hand-checking
was frequent, lower scoring was the norm, and the pace of the game was slower (some of these factors contribute to why we remember Michael Jordan as a par-ticularly special player, even though players like Kobe Bryant and Allen Iverson averaged similar point totals in their careers).
Second, and more practically, playoff odds were complicated prior to 2003 by the fact that there was an uneven number of teams between conferences. Namely, there were only 14 teams in the Eastern Conference prior to the expan-sion of the Charlotte Bobcats (currently Hornets) while there had always been 15 teams in the Western Conference. This complicates our odds calculations.
This truncation of data is not without its costs, however. As we will see more extensively in the methodology section, it creates some necessary data sparsity issues that we deal with later.
Initially, we collected data on a team average per game basis over the afore-mentioned eleven seasons on 27 variables including but not limited to shooting (field goals made and attempted), rebounding (offensive and defensive), passing (assists and turnovers), and defense (points allowed, turnovers forced, opponent field goal percent). Due to data sparsity, and for the sake of clarity, we use variable selection techniques to limit the amount of data entering our model.
Figure 2: Glmulti analysis cycles through all potential models to return the best model including seven or fewer covariates
4.1
Variable Selection Process
Through a process of variable selection, we significantly whittle down the num-ber of variables. As previously explained, our data repository only extends back to the 2003-2004 season when the number of teams per conference was equal-ized. Therefore, we only have 11 instances of success or failure data for a given team, where success is defined by postseason qualification. Insofar as we are faced with sparse data, we find it essential to reduce the number of covariates to single digits in order to induce explanatory results.
We began the process by performing principal components analysis with an emphasis on factor loading scores. In this approach, we hoped to identify the clear formation of a small, discrete number of clusters. However, this approach did not work particularly well as it broke into many clusters with low levels of explanatory value.
As a result, we pivoted towards using normal forward selection with Hosmer-Lemeshow and drop-in-deviance testing. This approach is generally ideal with a limited number of covariates. However, in this case, extensive model testing and comparison proved time-intensive and largely fruitless.
In light of these circumstances, we sought out the use of an R package called glmulti. Created by Vincent Calcagno, this package is an ”efficient enumera-tor” for model selection and averaging that creates all possible candidate
for-Figure 3: A visual analysis of model desirability ranked by AIC weights against a number of distinct, potential models
mulas while also factoring in marginality and other constraints. After analyzing every possible permutation of logistic models (Fig. 2), the package then spits out the top 50 models based upon a selection criteria. In this case, we use AIC weights as a proxy selection criterion (Fig. 3).
Beyond its simplicity and speed, glmulti proved to be a particularly useful tool for two additional reasons. First, the package allows for ”force-feeding” of particular variables. We strive to find a balance between building the most explanatory model for playoff success while also constructing something that is interpretable and actionable. If the top factors we identify are immovable ones such as years of coaching experience, that is less valuable to a team than if it is something like number of steals.
Second, the package allows for setting max limits on the number of variables that can be included in the model. This is particularly important in some of our later approaches where overfitting parameters can be a particularly significant concern with only eleven seasons of data.
Via this process, I identified the following combination of six covariates as particularly explanatory, both from a statistical and interpretive perspective:
• Free Throws Attempted: A measure of the average number of free throws a team attempts over the course of a game
• Personal Fouls: A measure of the average number of fouls committed by a team over the course of a game
• Three Point Shooting Percentage: A measure of the average shooting percentage on shots taken beyond the three point arc per game
• Defensive Rebounds: A measure of the average number of missed shots collected by the defending team per game
• Blocks: A measure of the average number of attempted shots that a team prevents from reaching the basket per game
• Turnovers forced: A measure of the number of turnovers forced by a team on defense resulting in the offensive team being unable to shoot per game
5
Modeling Approaches
We begin our overview of the modeling approaches from a high-level perspective, in order of increasing complexity. Through each approach, we intend to tell a different story connecting to the larger overall theme.
5.1
Approach One: A Global Model
In the most straightforward approach, we model a logistic regression with a common intercept where we select about 6 parameters. With fifteen teams per conference and eleven seasons, this gives us 165 trials within conference from which to learn about the parameters (Eq. 1).
In this model, we hope to learn about the general value of a marginal statis-tical increase. Learning which statistics most markedly contribute to increased playoff odds can help inform coach opinions on general basketball philosophies – for instance, whether it seems statistically optimal to crash the boards for an extra rebound, knowing that this will compromise the defense’s ability to set itself and force a turnover or block.
This modeling approach affords us a good deal of flexibility in terms of number of parameters, with 165 observations per conference (15 teams and 11 seasons).
Log Pit 1 − Πt
= XitTβ (1)
5.2
Approach Two: A Relative Boost Model
Model 1 suggests a global approach to playoff qualification that may not carry between teams. In Model 2, we expand upon the previous model by incorporat-ing a random effect for teams. This results in a common coefficient sayincorporat-ing that all teams react to the respective covariates in generally the same manner, but that there is a slight adjustment to each team (Eq. 2).
The purpose of this model is to capture differences between teams that we may have missed in the first approach. In other words, we use it as a surrogate for including a whole bunch of other variables in the model. We can then observe which teams are boosted or pulled down relative to the overall mean. Not all teams with the same characteristics have the same odds of making the playoffs, and we account for this fact here.
An additional benefit of this approach (as with Model 1) is that we can work with all the data, meaning that we are able to play with 165 Bernoulli
trials. The difference is that we expand from approximately 5 parameters to 20 parameters (one for each of the 15 teams per conference).
This type of modeling approach is important because it recognizes that two teams with exactly the same characteristics may not actually have the same odds of playoff qualification. This might be due to any variety of factors – variables that we did not include in the model, team and coaching philosophies, or dependency factors such as divisional and conference strength. By identifying which teams are boosted or deflated relative to the league-wide mean, we can see who is benefiting by virtue of some odd factors and quantify the percentage bump to playoff chances between two teams with identical characteristics.
Log Pit 1 − Πt
= XitTβ + αi (2)
5.3
Approach Three: Individual Team Modeling
Now, we abandon the assumption that all teams respond to covariates in the same manner (Approaches 1 and 2 implicitly made this assumption, to varying degrees). That is, the Sacramento Kings might respond to an assist differently than a team built specifically around ball movement and passing. In reality, this is likely a more realistic assumption than the ones we have made above.
We feel that creating a logistic model for each team is quite useful in calling out individual team strengths. However, we must note that in this case, we only have eleven Bernoulli trials to play with (one for each season).
In the presence of this sparse data, there is an epistemic question as to how many parameters we can actually learn well about. Almost certainly, the answer is less than five. Nonetheless, we are willing to play this type of game and see where it leads us in this scenario.
For purposes of comparison, we choose the same three variables to examine for each team in the NBA. The selection process is similar to that described in section four of this paper. The value in this approach is figuring out which types of metrics have the most impact on which teams and learning more about individual team philosophies.
5.4
Approach Four: Team Modeling with Dependency
In this approach, we are trying to look at the event that a given team made the playoffs. We then desire to build some dependence on the outcomes of the other teams through a joint modeling approach (Eq. 3). The problem that arises with putting in a coefficient for each team, however, is that we would have more parameters than data points.
As a workaround, we simply add one parameter that indicates the number of teams from a given division that made the playoffs. The end result is a pseudo-likelihood approach to solving the dependency problem – one that is magnified more than ever this season in the NBA with the discrepancy in talent between the Western and Eastern Conference. This is a reasonable compromise, because it makes the situation more manageable from the standpoint of data sparsity.
Moreover, this approach accounts for the fact that playoff qualification rates for a given team are not independent of other teams. This is a critical fact that is often overlooked in other playoff prediction models.
Xit= 1|Xjt, j 6= i (3)
5.5
Approach Five: Bayesian Mixed Modeling
The last step that we may wish to incorporate is to make all the random effects have priors on them and to fit a Bayesian mixed model logistic regression (Eq. 4). This way, we get a predictive distribution for all the probabilities rather than a single point estimate.
Such an approach generally helps with interpretability, as we can check if the distribution looks close to zero or one for each team and for each model.
α ∼ N (0, σ2α) (4)
6
Results
6.1
Approach One (Global Modeling) Results
The goal of this first pass attempt was to glean which actionable variables may play the biggest role in contributing to playoff qualification odds. To that end, the results indicate that the most important variable among playoff teams is defensive rebounding. This result is robust across both the Western and Eastern conferences.
In the East, one additional marginal defensive rebound per game affords a team nearly five times greater odds of making the playoffs (Fig. 4). To put that in perspective, let’s take a look at the league this season. The Chicago Bulls average 34 defensive boards a game, while the Boston Celtics clear 33 rebounds off the glass per game. This model suggests that, assuming all other factors are constant, the Bulls odds of making the playoffs are five teams greater than those of the Celtics. This odds multiplier is slightly diminished in the West at close to three, but still the most significant individual factor in determining playoff qualification.
Indeed, a check of season end team statistics shows that the two worst defen-sive rebounding teams in the league – the New York Knicks and the Minnesota Timberwolves – happened to also be the two worst teams by record in the Eastern and Western Conferences, respectively. Meanwhile, 9 of the top 10 de-fensive rebounding teams in the league qualified for the playoffs, including the league-leading Golden State Warriors who ranked third in the category.
The next two variables in order of significance are turnovers forced and free throws attempted. In both the East and the West, the playoff odds multiplier associated with an additional turnover forced was slightly more than three. Of interest to note is that the value of a turnover forced in the Western Conference was virtually identical to that of an additional defensive rebound – however, in the Eastern Conference, the value of a defensive rebound was clearly the most valuable factor.
Finally, shooting an additional free throw per game resulted in a three times odds multiplier in the Eastern Conference, but only a two times odds multiplier in the Western Conference. Nevertheless, the ordinal value of these three
fac-Figure 4: An overview of the results from Approach One. The table shows the enhanced playoff odds for a team that averages an increase of one marginal statistical unit as opposed to another that does not, by conference. For instance, a team in the West that averages 20 defensive rebounds a game has, ceteris paribus, about three times the odds of making the playoffs as a team that averages 19. Bolded numbers are statistically significant.
tors was consistent across both the Eastern and Western Conferences with only magnitudinal differences.
It is also worth noting that an additional personal foul committed per game was associated with an approximately five teams reduced odds of playoff quali-fication across both conferences. For a more detailed description of the specific odds associated with a marginal increase in each statistical category analyzed, refer to Figure 4.
6.1.1 Approach One: Insights
The main insight gleaned from the first approach is that physical, disciplined, defensive-oriented teams seem to have the most success in terms of playoff qual-ification in the NBA. Of course there are exceptions – and we explore these in the following approaches – but a global model of playoff prediction suggests that the prototypical path to success revolves around physicality and defense. These are evident from the importance of an additional defensive rebound, forced turnover, and drawn foul.
Our research, then, suggests that there is some credence to the old heuristic that ”defense wins championships.” We can now point to this not only anec-dotally – with the dramatic turnaround of the Golden State Warriors into the league’s best team due to a transformation in identity from an offensive minded team to a team that leads the league in rebounding and defensive efficiency – but also empirically through the data.
6.2
Approach Two (Relative Boost Modeling) Results
In the second approach, we wanted to challenge the assumption that a global model can apply to each team. Instead, we tweaked the assumption to sug-gest that each team in the NBA reacts fairly similarly to changes in statistical categories, but that the playoff odds associated with those changes are slightly enhanced for some teams while they are slightly deflated for others.
The results of this approach are quite interesting. In the Western Conference, the two teams experiencing the greatest relative boost were the Los Angeles
Figure 5: An overview of the relative boost model for the Western confer-ence, visually displayed ordinally and magnitudinally. ”Boosted” teams enjoy enhanced playoff odds for identical stat increases against ”deflated” teams.
Lakers and the San Antonio Spurs. Meanwhile, the Golden State Warriors and the Minnesota Timberwolves were quite deflated against the league-wide average (Fig. 5). In the East, the Boston Celtics and Miami Heat experienced levels of relative boost while the Charlotte Bobcats and Philadelphia 76ers lagged behind (Fig. 6).
6.2.1 Approach Two: Insights
The interpretation of these results lends itself to a couple of postulations. First, it suggests that some teams can be successful even if they don’t hang their hat on strong, physical defensive, rebounding, and outside shooting. Likewise, it implies that some teams can possess these very qualities and still not be suc-cessful in terms of playoff qualification. Each team will react slightly differently to the same statistics – and this serves as a surrogate for some of the other factors we may have failed to capture in our first take approach.
The results of approach two do not invalidate those of the first approach, but they do suggest we need to dive deeper into the drivers of playoff success for each team. It is also interesting to note that the teams that tend to be boosted the most – LA, San Antonio, Boston, and Miami – are all teams that possessed Hall of Fame coaches during the period of study. Between Phil Jackson, Greg Poppovich, Doc Rivers, and Pat Riley, it starts to lend itself to intuition that these coaches may be a defining factor in helping their respective teams juice the most out of their respective statistics.
Moreover, it is also apparent that divisional strength matters – something we address in Approach 4. This may explain why a team like the Warriors – which played with and was dominated by championship caliber Lakers and Kings teams in the early 2000s – might have been deflated relative to the model. Moreover, the Warriors and 76ers both led the league in offensive categories and had lackluster defenses during their few playoff years during the stretch of observation. All of these factors – and probably many more – play a role into why certain teams were boosted and others were deflated. We continue exploring these reasons more deeply in our next approach.
6.3
Approach Three (Team Modeling) Results
Our second approach validated that there are team-level differences in playoff qualification odds across the league. In order to test those differences, we felt it appropriate to run individual logistic regressions for all thirty NBA teams.
Figure 6: An overview of the relative boost model for the Eastern conference, visually displayed ordinally and magnitudinally.
Two caveats should be noted. First, this approach limited the number of observations we could work with from 165 (per conference model) to just 11 per model (one for each season that a team played over the stretch of observation). This, of course, limited the number of parameters we could use in our model.
We decided that due to concerns about overfitting, we would limit the num-ber of parameters we used in each model to three. Our selection process was again aided by the use of glmulti with a maximum size constraint of three, to which more consideration is given in Section 4.1.
The second caveat is that we began the analysis by running the same three variables for each team. This is not ideal because model selection, aided by glmulti, informs our intuition that not every team is going to have their playoff qualification odds affected most significantly by the same three variables.
However, picking different variables for each team does not lend itself easily to magnitudinal comparisons – indeed, it would be like comparing apples to or-anges. For this reason, we chose to examine three significant variables suggested by global modeling: defensive rebounds, steals, and blocks.
The results of this approach were fascinating. We found a great degree of diversity in terms of the ordinal significance of these three variables across teams (Fig. 7). The most common ordering of importance was defensive rebounding, steals, and blocks with seven teams across the association exhibiting this quality. However, an additional seven teams were ordered by steals, defensive re-bounds, and blocks. Moreover, six teams were ordered in terms of steals, blocks, and defensive rebounds. The remaining eleven teams were split quite evenly across the other three possible permutations of those six variables.
Given these results, we recognized that the global model – while useful for predictive purposes and for describing many teams – was not appropriate for all teams. As a result, we decided to dive even deeper and create a customized, best fitting model for all thirty NBA teams.
Again, the use of glmulti along with traditional variable selection methods discussed in Section 4.1 aided with the creation of these thirty models. Given the diversity of ordinal importance of the three variables tested initially, we wanted to see what greater diversity we would find if we picked from all possible variables for each team.
It is impossible to discuss the results of all thirty models. However, we will note some general trends of interest here. We found that each team played to its
Figure 7: This pie chart shows the percentage of teams in the NBA with the same ordinal ranking of importance for blocks, steals, and rebounds in terms of playoff qualification odds. For instance, the RBS section indicates the number of teams in the NBA that value defensive rebounds, blocks, and steals in that order as the key to playoff qualification. There is a great deal of diversity in ordinal ranking, but these three statistics seem to be the most globally significant across all teams.
respective strengths differently. The San Antonio Spurs and Cleveland Cavaliers – teams that won multiple championships over the period of observation – found their playoff odds bolstered mostly by blocks and defensive rebounds. The Boston Celtics, another championship winning team in our observational era, predicated playoff success largely on forcing steals.
However, not all teams focused on defensive statistics to bolster their playoff odds. In particular, many free-wheeling Pacific division teams – most notably, Golden State and Sacramento, – found optimal success when they moved the ball well. These teams enhanced their playoff odds the most by marginal increases in assists.
6.3.1 Approach Three: Insights
There are two key takeaways. First, every team plays differently according to its own strengths. There is no magical, one-size-fits-all formula for playoff success. Many teams have success on defense, but others can do it by moving the ball and crashing the offensive boards to compensate for poor transition defense.
A second takeaway, though, is that championship-caliber teams did tend to follow a very similar blueprint. Over our observation period, only a limited number of teams won the championship – namely, the Lakers, Spurs, Mavericks, Heat, and Celtics. A common trend is that these teams mostly hung their hat on a solid defensive strategy – steals, blocks, or defensive boards. So while one could make the case that you don’t necessarily need to be solid defensively to
qualify for the playoffs, it also seems that optimizing championship odds might require special attention to the areas we have highlighted.
Thus far, we have thoroughly modeled playoff odds optimization for all thirty teams in the league. We conclude by recognizing that these odds are not simply dependent on team statistics – rather, they are dependent on other teams. That is, the chances of whether the Kings make the playoffs are dependent on whether other teams – particularly those within their division – make the playoffs as well. That is what we address in the next subsection.
6.4
Approach Four (Dependency Modeling) Results
In the previous three approaches, we learned quite a bit about the ordinal – and to a lesser degree, magnitudinal – importance of a variety of team statistics on playoff qualification odds in the NBA. In this final approach, we wanted to build – and to the extent possible, quantify – the dependency effect.
With limited data, we are limited in these conditional distributions. As a workaround, we decide to introduce an additional parameter representing the number of other teams in a given team’s division that made the playoffs in that particular year. With this ”pseudo-likelihood” approach, we strive to model how playoff odds are affected in the presence of competition.
Again, it is not feasible to discuss team-by-team results in any great depth here. Generally, we find that competition matters significantly for some teams – and not at all, for some others. In particular, we find that the Western Conference has been brutal over the last decade. The biggest factor in playoff qualification for the Rockets, Grizzlies, and Mavericks – more so than any indi-vidual statistical category – was the number of teams from their division that made the playoffs.
6.4.1 Approach Four: Insights
Prima facie, this is not at all unexpected. The Southwest division has long been viewed as one of the toughest in all of basketball with the San Antonio Spurs perennially competing for the division title and being challenged by the rest of the teams. Indeed, this season, all five teams from the Southwest qualified for the playoffs – a first in NBA history – leaving only three slots for the remaining two divisions in the West.
Presently, there is a huge talent discrepancy between East and West; the eleventh seeded team in the West would have qualified for the playoffs in the East this season. Many, such as Mavericks owner Mark Cuban, have called for either the re-alignment of divisions, the abolition of divisions, or the seeding of teams overall irrespective of conference. However, until such changes are implemented, divisional strength is likely to be a key factor in any model of playoff prediction.
In the Eastern conference, the lack of any real noticeable dependency effect is evident. In particular, this has been the case in the Atlantic Division. According to the model, divisional strength had zero quantifiable impact on whether the Knicks or the Nets made the playoffs over the period of observation. Indeed, New York basketball fans only have their teams to blame when it comes to their lack of success over the years.
Figure 8: This image demonstrates the results of a Bradley Terry Power ranking model. The Warriors are rated the best team in the Pacific Division while the Lakers are ranked the worst for the 2014-15 NBA season. The uneven spacing between teams represents the magnitude of the differential in strength between teams predicted by the BT model.
It must be noted that our approach to building dependency is somewhat ad hoc. It is indeed possible to build dependency from the ground up. This can be accomplished quite simply by Bradley Terry Luce methodology, which creates a probability model that can predict the outcome of pairwise comparisons. For completeness, we build such a model below. However, we note that it is less useful for our purposes since we prioritize actionability over prediction in our work.
6.5
Bradley Terry Results
The Bradley Terry model predicts the outcome of a comparison – in this case, the outcome of a game between two specific teams. It is a widely used model in playoff prediction because it can be constantly updated with new information about the outcomes of games from a given night, and then fed in to update power rankings through the league.
Using the Pacific Division as an example, we used the model to power rank teams based upon games played within division during the season. The model ranked the teams as follows: Golden State Warriors, LA Clippers, Phoenix Suns, Sacramento Kings, and LA Lakers (Fig. 8).
6.5.1 BT Model: Insights
From an accuracy standpoint, the model proved faithful and reliable. The or-dering given matched the final regular season conference standings of those five teams. We use the Bradley Terry results to aid in our playoff prediction models. However, for the purposes of our work with the Kings, we consider this work largely tangential – we prioritize being prescriptive over being predictive.
6.6
Approach Five (Bayesian Mixed Modeling) Results
We began fitting priors on our random effects and performing Bayesian mixed model logistic regressions using MCMCGlmm. Ultimately, we found that these ap-proaches added little to the story beyond what we had discovered in apap-proaches one through four – both from a predictive and from a prescriptive perspective.
Moreover, scalability and interpretability were big obstacles from the stand-point of delivering something actionable to the Kings. As a result, we decided to scrap this approach and focus on getting the most out of the initial four approaches.
Figure 9: Our overall predictive accuracy rose with each subsequently more sophisticated modeling approach until we ultimately hit nearly 100 percent. This represents validation of our approaches.
6.7
Overall Prediction Results
Finally, we wanted to capture the predictive capabilities of our modeling ap-proaches (Fig. 9). To accomplish this, we tested each model’s outputs against real NBA playoff qualification data. This served as validation for our statistical techniques. If our approaches have low predictive accuracy, then any prescriptive advice we offer would be based on faulty assumptions. However, if the model predicts playoff qualification well, then there is more compelling evidence to trust our preceding commentaries.
We began by analyzing the first approach. We did not expect this model to be particularly accurate given its generalizability. To our surprise, though, the model fared quite well. It predicted playoff teams over the last eleven years with nearly 80 percent accuracy – 79.4, to be exact – in both the East and the West.
This demonstrates that, independent of other factors, simply knowing the average per game stats in six basic stat categories can allow you to correctly predict whether an NBA team will qualify for the playoffs in 4 out of 5 cases. This is a somewhat astonishing revelation. Of course, we expected this to be a baseline and for our following models to perform even better – and they did.
In the second model – accounting for relative boost and deflation among teams – prediction was even more precise. We were 82.9 percent accurate in the East and 85.5 percent in the West, totaling 84.2 percent accuracy across the conferences. By accounting for relative team differences, but still assum-ing identical coefficients across teams, we boosted our predictive capability by almost 5 percent.
initially skeptical about these models because of the data sparsity issues we faced – only eleven seasons, and thus, eleven observations – but we were happily surprised with their predictive success.
Of course, modeling accuracy varied heavily on a team to team basis. How-ever, Approach 3 was able to predict the last eleven seasons with 100 percent accuracy for 14 of the 30 teams in the league. Overall, this approach was 87.3 percent accurate in playoff prediction across the NBA.
Finally, Approach 4 proved to be the most successful in terms of prediction. In this approach, we introduced a coefficient for divisional strength to account for dependency in a pseduo-likelihood manner. Though we admit the ad hoc nature of this approach, we are pleased to report that this approach resulted in 100 percent predictive accuracy for 22 out of the 30 teams in the NBA over the last eleven seasons.
Moreover, it was 95.2 percent accurate in classifying playoff teams for the last eleven seasons of NBA play – 314 out of 330 (Fig. 10). Knowing a team’s average defensive rebounds, steals, and blocks – along with divisional strength – allows one to predict playoff qualification with almost perfect precision.
6.7.1 Predictive Accuracy: Insights
In total, we are quite happy with the predictive success that we found in our modeling approaches. We included four approaches mostly to aid our prescrip-tive storytelling and because we thought each could stand alone as valuable. Nonetheless, we are pleased that the increasingly sophisticated approaches cor-respondingly resulted in greater predictive accuracy (Fig. 9).
Building a model that forecasts playoff qualification with less than 1 in 20 misclassifications is no small feat in the presence of immense data – it is even tougher in the presence of data sparsity.
7
Limitations
The primary limitation we face is a sparsity of data (the rationale for which we mention in section 4). We only have eleven years worth of data and hence cannot build the most robust explanatory model.
Only modeling approaches one and two – where we treat all teams to have the same coefficients – are conducive to rich exploration, allowing for 165 trials and more covariates. However, we would not be able to fit these models to other situations and hence, we would sacrifice some of the comparability between different models by resorting to these tactics.
Moreover, there are some issues of endogeneity that cannot be fully ad-dressed. For instance, a team that rebounds more might get this opportunity because they play better defense and force more missed shots.
Then, the question is whether to attribute playoff qualification odds to re-bounding or to some other defensive metric (shots altered, etc). There is no way to fully get around this issue, and, as such, it should be discussed intelligently on a team by team basis.
Figure 10: The confusion matrix demonstrates a retrospective team playoff misclassification rate of less than 5 percent. This means our model predicted 314 out of 330 team’s playoff statuses correctly over the last eleven years.
8
Conclusion
It is difficult to distill these varied methodologies and modeling approaches into quick takeaways, but we do our best to paint broad strokes here.
First, the results seem to quite robustly back up the old adage that ”defense wins championships.” Indeed, team’s that are able to average one more steal, block, or defensive rebound per game tend to have greater marginal playoff odds than teams that average one more three point basket, assist, or offensive rebound.
Of course, not all teams respond equally to these factors. Individual team analyses seem to reveal that the value of an assist, for instance, is important to teams predicated on ball movement, but less important to teams built around superstars and isolation plays (Approach Three). Additionally, not all teams respond to the same covariates in the same way.
However, a GM that is looking for ways to re-tool his squad into a play-off contender can take note that generally, building a physical and defensively disciplined team pays dividends.
some teams just can’t catch a break. Nonetheless, divisional strength tends to be cyclical, and the teams that are complaining now will likely be beneficiaries in the future.
We point to over 95 percent predictive accuracy in playoff prediction as proof of validation for our modeling work. Having built team-specific models for all thirty NBA squads, we are confident that we have pinpointed the primary drivers of playoff qualification across the league for this era.
9
Acknowledgements
Many thanks to my advisor Alan Gelfand for his help and support in formu-lating models and thinking about approaches. Moreover, thanks for his insight, guidance, and passion for the sport of basketball.
Also, thanks to the Sacramento Kings organization for having me on the team and opening a door into the world of basketball analytics. It has been a fun ride.
10
References
The following papers, which are cited primarily in sections 1-3, have provided great aid in guiding this research:
1. Baghal, Tarek. (2012). ”Are the ”Four Factors” Indicators of One Factor? An Application of Structural Equation Modeling Methodology to NBA Data in Prediction of Winning Percentage”
2. Hollinger, John. (2014). ”Hollinger’s 2014-15 NBA Playoff Odds.” 3. Hu, Feifang. (2004). ”Forecasting NBA basketball playoff outcomes using
the weighted likelihood.”
4. Martinez-Cruz, Armando (2002). ”A Stochastic Tip-Off: Simulating the NBA Playoffs with a Graphing Calculator.”
5. Paruchuri, Vic. (2012). ”Predicting the NBA Finals with R.” 6. Silver, Nate. (2014). ”Introducing NFL Elo Ratings.”
7. Stanke, Luke. (2012). ”Can statistical models outpredict human judg-ment? Comparing statistical models to the NCAA selection committee.” 8. Teramoto Masaru (2010). ”Relative Importance of Performance Factors
in Winning NBA Games in Regular Season versus Playoffs.”
9. Wei, Na. (2008). ”Predicting the Outcome of NBA Playoffs Using Naive Bayes Algorithm.”
10. Yang, Jackie. (2012). ”Predicing NBA Championship by learning from history data.”