Introduction

Predicting the winner of a given National Football League (NFL) game has long been of interest to fans, pundits and gamblers alike. Many factors can influence the outcome of a game. Some more obvious factors might include the current score differential, the amount of time left, the down, the distance to go to pick up a first down, the current yard line, and the strength of each team’s offensive, defensive and special teams units. Some less obvious factors might include the number of timeouts each team has left, the stadium in which the teams are playing, the weather, which team will receive the second half kickoff, and the in-game decision making capabilities of the respective coaches.

Others have developed in-game win probability models using some of these factors. Lock & Nettleton (2014) used a Random Forest that takes in a number of variables aimed at describing the current game state1, a variable—total points scored—that might offer insight into the level of variance associated with the game environment, and the pre-game point spread as a measure of the relative team strengths. Their model outputs the probability the team with possession of the football will win the game. Pro-Football-Reference (PFR) has published an in-game win probability model that expands on previous research by Hal Stern and Wayne Winston and treats the change in score from a given point until the end of the game as a normal distribution, calculating win probability by finding the proportion of the normal probability density function that corresponds to a final score differential that is greater than zero. A different unspecified model is used for game states where less than five minutes remain. The mean of the PFR normal distribution is taken to be the current score differential plus the expected points added (EPA), a feature designed by PFR to find the value of a given combination of down, distance to first down, yard line and time remaining in the half. The variance is taken to be 13.452 multiplied by the fraction of game time that remains (Paine, n.d.). Maksim Horowitz, Samuel Ventura and Ronald Yurko developed a win probability model (and the wonderful nflscrapR package that was used to load in the play by play data used in this analysis) that uses a multinomial logistic regression to evaluate the value of field position and a Generalized Additive Model (GAM) to output a win probability. ESPN also boasts a win probability model. A description of ESPN’s methods was not listed on its site, but in 2017 Michael Lopez, then an assistant professor at Skidmore College, now the NFL’s Director of Data and Analytics, described the model as “derived from an ensemble of machine learning models.”

Unfortunately, of these models, only the results from the model developed by Horowitz, Ventura and Yurko (2018) are available online, though Lock and Nettleton (2014) give a summary of their model’s effectiveness when tested on the 2013 NFL season and were kind enough to share the results of an updated model for games during the 2017 season. This makes it difficult to evaluate of the effectiveness of different methods or set a benchmark for what level of performance constitutes an effective model.

The aim of this work is to create a win probability model that improves upon shortcomings in existing models. Some models—Lock and Nettleton (2014), seemingly ESPN—draw upon ensemble learning methods that allow for nonlinear modeling of complex interactions, while Horowitz, Ventura and Yurko (2018) use a GAM to a similar end. Others, like PFR, draw upon the link between change in score differential and the normal distribution to utilize a known cumulative distribution function (CDF) that allows for an easy transformation to win probability space. However, no model has advertised itself as doing both3, a modeling process that could be carried out by modeling more granular events—the probability distribution for the outcome of a set of downs or the next scoring event, for example—in a nonlinear manner, drawing from a probability distribution that models these events and feeding the result to a normal distribution. The process of Horowitz, Ventura, and Yurko (2018) is most similar, but they substitute a GAM for a normal distribution and feed the expectation of the value of the current game state to the GAM instead of feeding samples from the distribution of future game states that might exist. Additionally, though some models may implicitly account for the possibility that games featuring different game states may feature different variances even when time remaining is held equal, no existing model appears to account for differences in variance that occur as a result of differences in the types of teams playing in one game versus another. The intuition behind this is that all else held equal, a game between two high-powered offenses would seem to feature greater variance in certain situations than one between two high powered defenses. It also stands to reason that if variance is assumed to depend on the game state and aspects of team make-up, then \(\mu\) might also. In short, this paper seeks both to flesh out some of the nuances in team composition that might affect win probability and to measure the effect of certain variables—mainly those related to field position and down and distance—indirectly through the effect that differences in distributions of future game states has on win probability.


  1. They used down, yards to go for a first down, yard line, score differential, seconds remaining, timeouts remaining for each team, and adjusted score—score divided by the square root of seconds remaining + 1—to describe the current game state

  2. The variance of the final score from the pre-game point spread over all NFL games from 1978-2012

  3. Although the lack of transparency exhibited by many of these models makes this difficult to evaluate