Chapter 5 Discussion

Though the model performs well, it is not without its limitations. We model change in score using:

\(\Delta score_i \sim \mathcal{N}(\hat{\mu}_i,\hat{\sigma}^{2}_i)\) where \(\hat{\mu}_i\) and \(\hat{\sigma}^{2}_i\) are defined in section 2.1.

But modeling \(\mu\) with a linear model assumes homoscedasticity, an assumption that is almost certainly incorrect. Additionally, none of the models account for different levels of uncertainty in the variables that describe team strength. It seems likely that the distribution of a team’s possible DVOA ratings is much narrower at the end of the season than at the beginning. Ideally the model would reflect that. The model is also probably slightly off in certain situations where a specific outcome has not occurred in the data used to train the model. For example, there were not any safeties in overtime from 2010 to 2016 so the models used to predict the next scoring event or the outcome for a given set of downs will specify a probability of zero for a safety. While the true probability of a safety in overtime is very low, it is not zero. Another model limitation can be found in the handling of field goal attempts, extra point attempts and two point conversions attempts where the probability of success is calculated using the league average success probability for each event as opposed to a model that takes into account the teams in question.

The model is also limited in the data it takes into account. Injuries to key players are not considered. Though the pre-game point spread is thought to take injuries into account, this still leaves the model blind both to in-game injuries and to the effects that injuries may have on the composition of a given team’s strengths. The model also does not take weather into account. Generally this probably does not affect results much, but it would be nice to have a heads up when something extreme is happening in our dataset, like temperatures far below zero or snowstorms that verge on blizzards. The model also does not currently consider the strategic acumen of either coach, a factor that would seem to be important in close games. One last blind spot concerns timeouts, as the parameters of the normal distribution from which \(\Delta score_i\) is drawn do not consider them. Timeouts seem to help teams less by increasing the raw expectation of score differential and more by helping teams modify the distribution of discrete score differentials to their advantage at the end of games. As such, the choice was made not to include them in the linear models from which \(\hat{\mu}_i\) and \(\hat{\sigma}_i\) are taken.

5.1 Conclusion

Our modeling process is able to achieve robust predictive results by using a series of models to compute a distribution of EPA by sampling future game states, a variation of the trick PFR uses to derive win probability from the Normal CDF, and Generalized Boosting Models to handle win probability predictions when normality assumptions break down. Future versions of this model will ideally implement a form of heteroscedastic regression to better model the \(\mu\) parameter of the Normal Distribution as well as measures of uncertainty for predictors that stabilize over the course of a season. Future work may also be done to measure the effects of injury on win probability. Horowitz, Ventura, and Yurko (2018) detail a mechanism for applying the idea of “wins above replacement” (WAR), first developed in baseball, to the NFL. Such a measurement for NFL players would be very helpful for quantifying the impact injuries might exert on win probability.