To the growing hockey analytics movement, the notion that puck possession is a critical predictor of winning is largely taken as an article of faith. Earlier this year, Chris Boyle published a fascinating infographic illustrating the linkage between strong possession play and playoff success, and some work I’ve done has shown that Fenwick Close differential (i.e., net shots on goal and missed shots in close-game even-strength situations) is a significant predictor of regular-season performance.

Nevertheless, there are inconsistencies in the association between possession and winning that bother me. Across the eight seasons of single-game data I compiled to create the Puck Prediction model, shot differential (which correlates strongly with Fenwick Close) was only predictive of 54.7% of game results. To put this in perspective, you’re more likely to pick the winner of a game correctly by simply choosing the home team than you are by choosing the team with the better shot differential entering the game. (A t-test comparing shot-differential predictions to random predictions is statistically significant, but the correct interpretation of this analysis is less “this relationship is really meaningful” than “a sample of 10,000+ games allows us to estimate a weak association with a high degree of precision”.) What’s more, until the launch of the excellent Extra Skater website this year, game-level Fenwick Close data haven’t been available, so the utility of advanced possession metrics for explaining game outcomes has been unknown. This is important insofar as the adoption of hockey “fancy stats” by teams may well require a demonstrable linkage between specific systems of play and the probability of winning. Which requires us to be able to point to factors within games that explain how teams win.

Fortunately, the proprietor of Extra Skater was kind enough to assemble a dataset for me that included single-game advanced statistics for all 720 games of the 2013 regular season. (Thanks very much, Darryl!) After some manipulation, I calculated single-game Fenwick Close, even-strength shooting and save percentages, and PDO for each team in every game. In order to explore the consistency and predictive utility of each of these measures, I also calculated 1-game, 3-game, 5-game, 10-game, and 20-game “lagged” values prior to each game in the dataset. A one-game lagged Fenwick Close, for example, would be the single-game Fenwick Close from the team’s last game. A three-game lagged Fenwick Close for game *i *would be the total Fenwick Close for games *i* – 1, *i* – 2, and *i* – 3.

My analysis of the consistency of Fenwick Close looked at the autocorrelation of 1-game, 3-game, 5-game, 10-game, and 20-game Fenwick Close in game *i* against the same measure in game *i* – 1. In order to put the correlation coefficients in context, I repeated the analyses for shooting percentages, save percentages, and PDO. The results are below. As you can see, the autocorrelation of Fenwick Close is much stronger than that of shooting and save percentages. Not only is single-game Fenwick Close more consistent than single-game Sh% or Sv%, it has a stronger autocorrelation than 3-game and 5-game moving averages of these variables, in which some of the noise has presumably filtered out. A 20-game moving average of puck possession has an autocorrelation of 0.77, while a similar average for Sh% has an autocorrelation of just 0.41, and a 20-game Sv% has a 0.52 autocorrelation. So, if you need more evidence that puck possession is a much more consistent and repeatable measure of team performance than shooting or goaltending, there you go.

But what about the relationship beween these variables and winning? To look at this, I loaded the data into R, and performed a series of logistic regressions using generalized linear models and standard errors adjusted to reflect the serial correlation in the data. One set of models estimated single-game win probability as a function of 1-game, 3-game, 5-game, 10-game, or 20-game lagged Fenwick Close. A second set included lagged PDO (in a similar fashion) as a second independent variable. Finally, a third set of models replaced PDO with lagged versions of its components. The results are depicted below.

A few points to note here:

- When the number of observations in your data gets into the hundreds, it’s wise to be somewhat skeptical of statistical significance. Any effect can appear significant if you have enough power to throw at the model. For example, while it’s interesting that the prior game’s PDO is a significant predictor of win probability, the tiny effect size makes this a tough result to interpret.
- When looking at the large odds ratios that are frequently associated with Fenwick Close, Sh%, and Sv%, keep in mind that these variables only take values between 0 and 1. As such, the effect might not be as dramatic as it looks. For example, the odds ratio for the 5-game lagged Fenwick Close should be interpreted as follows: a team with a 5-game Fenwick Close of 100% is 17.47 times more likely to win than a team with a Fenwick Close of 0%.
- PDO may be useful as shorthand when discussing luck in team performance, but these results suggest that its use can obscure the predictive value of its component parts.
- The consistency of Fenwick Close may actually work against its utility as a predictor of winning. That is, the autocorrelation results suggest that a team’s Fenwick Close converges to a steady-state value fairly quickly, which implies that it doesn’t vary much from game to game. Unfortunately, in a regression-based analysis, a measure that doesn’t vary much isn’t going to be able to explain variation in the outcome of interest.
- On the other hand, Sh% and Sv% are much less consistent variables, yet their greater variability doesn’t translate to a stronger correlation with win probability. This suggests a weaker underlying connection to wins.

Obviously, the above shouldn’t be viewed as definitive. Between the lack of inter-conference play and other factors, it’s entirely possible that idiosyncratic features of the 2013 season contributed to these results. I plan to update this analysis as game-level data from other NHL seasons become available. Another important limitation: my sample sizes dropped off as the length of the lag in my analysis increased. For example, analyses featuring variables with 10-game lags necessarily excluded the first 10 games of each team’s season.

Still, as someone like myself or Josh Weissbock can tell you, using statistics to predict single-game outcomes is very, very challenging. It’s likely the case that stronger possession play leads to a marginal increase in win probability, and over the course of a long season, this increase translates into additional wins and points. The same, of course, can be said of goaltending and team shooting, but these are less reliable from game to game than controlling the puck. But when it comes to single games, or small numbers of games (i.e., playoff series), no variable, even Fenwick Close, is as predictive as you might expect.

Great post. And, big shout out to @extraSkater for making data available so we can do these investigations.

One point.

But when it comes to single games, or small numbers of games (i.e., playoff series), no variable, even Fenwick Close, is as predictive as you might expect.”

I disagree with this comment.

We know that luck or random factors plays a huge part in hockey.

Therefore it actually should be expected that predicting wins in small game sample is

not possible.

Well, Josh Weissbock’s work with machine learning has identified ways to make single games in non-NHL leagues pretty predictable. And I think new data sources and more complex algorithms could make single games much more than a coin flip. But, in general, yes, there is a ton of randomness in hockey that plays a big role in deciding games. Even if most hockey writing tends to forget that.

Considering the nature of the 2013 season, like you said, I don’t trust any of this data. I think it’ll take a few years of data to come up with any real hypotheses.

Yeah. Like I said, I’ll update this as soon as I can with additional seasons. Whether that’s at the end of 2013-14, or sooner (if, say, 2011-12 becomes available on Extra Skater), we’ll see.

“…machine learning has identified ways to make single games in non-NHL leagues pretty predictable”

Actually by my calculation this is not accurate.

With Home ice factored in. A coin flip gets you ~55%.

The best! program can pick ~60-61%.

That leaves a 6% optimum. If, Josh’s program can achieve this moving forward.

Or, ‘one in 20 games’ over a monkey.

My research is that no improvement in stats will matter in small sample predictions.

However, the NHL could lower the impact of randomness by 1. changing penalties.

(Have them strong correlates with skill infractions) & 2. make the nets slightly larger – to reward skill.3) Get rid of the Loser point. But, the league doesn’t want this. they love the artificial parity.

.

TERRIFIC WORK! a couple of q’s: I’m probably missing something, but doesn’t it seem odd that Pdo has identical coefficients across those models? I am always reluctant to suggest someone do more work when I don’t have to do it, but how about doing a robustness check with about half the data from each conference and see how well or closely it compares to the remaining data from the out of conference results. if the findings are similar then maybe the shortened season is less likely to be an issue? Finally, did you run the models with an indicator for home team as a control? It would be interesting to see whether and how much the coefficients change. okay one last point, really, the pattern of the size of the coefficients is really interesting across lags. Going from smaller to larger to smaller. Any ideas about what is going on there?

Hey! Thanks for the kind words.

1. The PDO coefficients aren’t actually identical; they just look like it because I only reported out to 2 decimal places. That is, they vary, but only between 1.001 and 1.004.

2. I hope to do the validation steps you suggest at some point, but only after I have more data. I’m worried about sample size issues if I slice the 2013 data too finely.

3. Adding an indicator for home/road is a good idea. Again, when I have more data, I’ll look into a fuller model rather than one limited to #fancystat predictors.

As far as the pattern of effect sizes across lags, it is interesting, but very noisy. I could throw out ideas about them, but I’m worried the patterns will change when more data are available.

Pingback: Happy Holidays from Puck Prediction! | Puck Prediction

Pingback: A New Method for In-Season Regression of Hockey Statistics | Puck Prediction

Pingback: What We Can Predict, and What We Can’t | Puck Prediction