Tuesday, October 29, 2013

What's the matter with Win Probability?

Looks like I got this post up just in time. Brian has introduced a series of updates to the Win Probability calculator that largely address the WP overconfidence noted here. He also added a way to adjust the model for pregame expectations of team strength.

Brian Burke’s Win Probability (WP) stat (see here for the explanation and here for the weekly game visualizations) is a must for fans of NFL analytics. The individual game visualizations might as well be a box score because they track the ebb and flow of a game with far greater detail than quarterly score totals or drive descriptions in about the same amount of real estate.

Watching the WP metrics this season, however, I can’t help but feel that there are more wild swings than in previous years. After Detroit’s comeback against the Cowboys this weekend I decided I needed to take a look – despite having been burned by my biases on previous hunts initiated by anecdotal evidence.

WP Background

The stat is created by looking at previous seasons and evaluating whether teams in similar circumstances (down & distance, field position, time left, score differential) ended up winning their game. The resulting stat is not so much a forward-looking projection as a proportion of the teams that ended up winning from that state. Herein lies the potential problem: if the “previous seasons” in question stretch back too far the game may have been different enough that a 7 point lead doesn’t mean the same thing while if the “previous seasons” in question doesn’t go back far enough there will not be enough data for the myriad configurations of the variables.


Using my trusty, homemade Monte Carlo simulator I’ll start by looking at the number of teams to win after being the first in their game to reach a WP of 90% (WP90).

In the 120 games played through Week 8 of the NFL season, 106 of them have ended with the first team to reach WP90 as the winner[1]. This won’t account for where a team got to above WP90 before losing but does give us a nice, conservative place to start.

Looking at 1000 simulations of the 120 games this season there is an average WP90 games won at 108 (go figure) with a sample StDev of 3.2. Of the simulated seasons, 10.3% ended with 106 WP90 winners and 18.5% ended with even fewer. This implies the chance of seeing 106 out of 120 is roughly 28.8% - slightly uncommon but not really rare.

Digging into the data a bit further there is an interesting revelation. The 14 teams – actually 12 with Tampa Bay and Dallas both appearing on the list twice – that went on to lose after reaching WP90 actually tended to go well beyond WP90 before losing. Four of the teams reached WP99 and two reached WP98. If we are generous and give them all WP98, we would still only expect a 3.4% chance of seeing that many losses according to the simulator. If we are not generous and give them WP98.5, we only see about 1% chance of that many losses.


There’s something going on at the far end of Win Probability and I suspect a combination of increased pace of offenses, more efficient offenses and out-of-sample results.
  • Increased Pace – I wrote about this last week but it goes beyond the overall pace in a neutral situation. With pace being a focus this offseason after New England's success and Chip Kelly's hiring, many more teams than usual came in armed with a no huddle offense that can be quite handy during two minute drills or against big fourth quarter deficits
  • More Efficient Offenses – Regardless of how far back Brian’s data goes this season is out of sample so far in terms of offensive performance compared with 1, 5 or 10 years ago. 23.1 points per game compares with 22.8 in 2012, 22.0 in 2008 and 20.8 in 2003. Plays per game are only up 5% in the same period while pass yards per game are up over 20%, favoring comebacks from big deficits.
  • Out-of-Sample Results – Some coaches appear to be very creative at coming up with new ways to win or lose (especially lose, very creative there) games. Until a game has been blown in a certain way, and really until a game has been blown from a specific point on the field with a specific amount of time left, the model won’t acknowledge the possibility that it can happen.
For all that, Win Probability remains my preferred way to evaluate in-game decisions and catch up quickly on a game I missed. As the league finds equilibrium in the offense/defense continuum these results should become more in line with the stated percentages.  

[1] This saves me from having to do any quirky math to account for the fact that 2 teams (New Orleans in Week 2 against Tampa Bay and Cincinnati in Week 3 against Green Bay) were first to WP90 before dropping below WP10 and then rallying to win the game. Happily enough, if we consider that there are 16 teams (the 14 losers plus Cincy and New Orleans) to go above WP90 and below WP10 in the same game, it is reasonable to believe that roughly 10% of them would recover to win.


  1. I have thought this was a possibility for a while now, but what are you going to do?

    I suppose you could look at the trendline of performances in late, close games and, instead of taking a straight average of 1999-2012, use a weighted average or project teams playing more aggressively/efficiently going forward, but you risk overemphasizing noise in the data.

    1. I think that's the right approach. It's more about understanding the points where WP might be weak than trying to correct and risk creating other problems. شهاب مظفری

  2. James,

    I think that's the right approach. It's more about understanding the points where WP might be weak than trying to correct and risk creating other problems.