Whenever hockey analysts attempt to forecast future performance from past data, it’s a given that regression to the mean needs to be taken into account. A great season or a strong run of play often reflects natural, random variation rather than underlying talent or strategy, particularly in a sport like hockey, where so much hinges on fortunate bounces and 50/50 plays. In general, it’s usually safe to assume that exceptional results (whether good or bad) tend to regress to the average. An analytic technique for regressing statistics to the mean is described in this article by Eric Tulsky, and has been used by other hockey statisticians and even the Puck Prediction Playoff Forecast model. Essentially, the idea is to use the year-on-year autocorrelation r as a measure of a statistic’s repeatability, and use (1 – r) to regress the value of that measure toward the average.
When we’re talking about forecasting full-season performance, I don’t have a problem with this method. Still, as someone who’s using it to regress partial-season measures to estimate their full-season values, a number of things bother me about it. The correlation r is generally derived from complete seasons of data, but as I’ve found, most of the critical measures of team performance are extremely volatile in partial-season samples. The implication here is that, in small samples, we really know nothing empirical about team performance, and would tend to regress nearly all the variability out of each team’s statistics if we’re doing things right. But, clearly, this becomes less true as the season goes along. As a team gets closer to 82 games played, we should become more confident that their performance is sustainable, for the simple reason that it has fewer opportunities to regress back to average. If, for example, a team has a 10% even-strength shooting percentage (unlikely, but not impossible, as the 2009-10 Capitals demonstrated) at the 70-game mark, it’s incredibly unlikely to regress back to 8% by season’s end. What we need is a method that allows us to adjust our uncertainty about team performance as the season moves along. Fortunately for all of you, I’ve put one together.
The starting point of my approach is the same as the original method: use the year-on-year autocorrelation in our measures of interest to estimate repeatability. Where my approach differs slightly is that I’m calculating the repeatability of event rates rather than the percentages we usually talk about. Using data from the five most recent 82-game NHL seasons (2007-08 through 2011-12, for n=150 team-seasons), I estimated year-on-year correlations for even-strength GF, GA, SF, and SA (adjusting the numbers for differences in 5-on-5 TOI), close-score even-strength Fenwick For and Fenwick Against (again, adjusting for differences in TOI spent in such situations), PP goals, PP opportunities, penalty kills, and PK chances. I then took 2013 data and, after applying the same TOI adjustments and extrapolating 48-game numbers to an 82-game season, estimated expected values for these measures, by team, for the 2013-14 season, regressing them to their 2007-12 means using the correlation coefficients. Once these expected season totals are calculated, they can be scaled according to the number of games played in 2013-14. Which means that they can be added to a team’s in-season totals to provide an estimate of what the 82-game numbers might look like.
To provide an example, let’s say a team has played 43 games, and we want to gauge how sustainable their 9.9% shooting is likely to be. After 43 games, they’ve scored 104 goals on 1,052 shots, and based on my regressed estimate of their shooting performance using 2013 data, we would have expected them to score 145 goals on 1,778 shots over the full season. If we assume that the team (OK, you might have guessed I’m talking about Anaheim) will shoot and score at the pace we estimated, we’d simply multiply 145 and 1,778 by (39/82): this tells us to expect the Ducks to score 69 more goals on 845 more shots over the remainder of their schedule. At that point, it’s just a matter of adding the even-strength goals and shots they’ve accumulated through 43 games to these totals. This gives us 173 goals on 1,897 shots, or a shooting percentage of 9.1%. Updating the analysis later in the season (after, say, 70 games), you would multiply our estimated 2013-14 goals and shots for Anaheim by (12/82) and add this to the observed totals. The implication, obviously, is that that we’d rely more heavily on our expectation of regression to the mean early in the season, and trust the data more late in the campaign.
The tables below depict the estimated 82-game values of team Sh%, Sv%, Fenwick Close %, PP %, and PK%, by division, regressed using this method. The observed data were pulled from Extra Skater and nhl.com on January 4, so they represent between 40 and 44 games played per team.