Recently, I wrote a pair of posts under my SB Nation pseudonym over at the Sharks blog Fear the Fin. Both used game-log data to look at variability in goaltender performance. The first of these was more a messing-around exercise; the second, which is below, was much more rigorous. Enjoy.
When evaluating goalie performance, measures like wins and GAA can be problematic, insofar as they reflect team quality as much as goaltender quality. While save percentage is usually the better choice of statistic, it often doesn’t tell you what you’d like to know either. For one thing, you need to be capable of a very high Sv% to end up playing regularly in the NHL, so the netminders we care about don’t differ a lot in Sv% (i.e., how much better is a 0.915 goalie compared to a 0.912 goalie?). But more to the point, the raw proportion of shots stopped doesn’t tell you how often a goalie plays well enough to steal games (or, conversely, how often he plays badly enough to inspire beach-ball memes from his team’s fans).
And that variation does matter. After grabbing game log data on team Sv% going all the way back to the last2005 lockout; I calculated the conditional probabilities of winning associated with various save percentages. Long story short: below about 0.825, win probability hovers below 10%; above 0.825, it increases monotonically and steadily until it peaks at 97.6% when the goalie pitches a shutout (a handful of scoreless ties going to shootouts explains the other 2.4%). The single-game save percentage associated with a win probability of 50%? 0.9231. (Interesting trivia note: the single worst team goaltending performance in these data belongs to the Ducks, who gave up 7 goals on 16 shots against the Flyers in November 2006. That Jean-Sebastien Giguere was riding a 0.922 Sv% to the Stanley Cup a few months later only underscores how much single-game performance can vary.)
Rather than focus on a handful of goalies as before, I decided to focus on the last six seasons (beginning with 2007-2008), and grabbed all goaltender game log data in that period from nhl.com, encompassing 15956 appearances from 159 goalies. Partial game appearances and playoffs were included. I used the full database to estimate the probability distribution of single-game save percentages among all NHL goalies, grouping them using 8 bins (<0.825, 0.825-0.850, 0.850-0.875, 0.875-0.900, 0.900-0.925, 0.925-0.950, 0.950-0.975, 0.975-1.000). I also calculated what I define as “Quality Appearances”: the proportion of goalie appearances with a save percentage of 0.9231 or better (i.e., the proportion of appearances in which the goalie plays well enough to give his team a greater than even chance of winning). In addition to presenting the overall results, I’ve calculated them for every active goaltender with at least 50 NHL appearances since 2007-2008. These results are below (click to enlarge).
A few things to note here:
- Disastrously bad goaltending is more common than one might think. 11% of NHL appearances are associated with a save percentage below 0.825, which (assuming the league average of 30 shots against per game) translates to over 5 goals allowed per game. In contrast, about 10% of all appearances have a Sv% of 0.975 or higher (in practice, these are all shutouts). In other words, single-game Sv% is frequently pretty far from a goalie’s average.
- Slightly technical statistical note: this is why looking at the standard deviation of Sv% can be misleading and confusing. When you calculate the SD of Sv%, you’re not looking at the variation of single-game Sv%s from the mean Sv%: you’re looking at how all the 1′s and 0′s (representing saves and goals allowed) vary around the average. Once a goalie’s faced a few thousand shots, Sv% becomes harder to move, and the SD gets smaller*. At that point, however, the SD tells you nothing about the probability of a bad (or great) game. To drive it home: the SD for Sv% in my all-goalies dataset is 0.000428; almost 100% of the single-game save percentages in the data fall more than 2 SDs from the average.
- The “Quality Appearances” percentage suggests how tough it is to be an elite NHL goaltender. Only 5 goalies have given their teams a win probability over 50% in more than half their starts; they include Henrik Lundqvist (widely considered the best goalie in hockey), multiple Vezina winner and (bafflingly) current free agent Tim Thomas, and three of the best young goalies in the sport in Tuukka Rask, Cory Schneider, and Braden Holtby. Just missing that 50% threshold are two more goaltenders generally regarded as outstanding: Tomas Vokoun (49.4%) and Roberto Luongo (49.2%).
- Andrew Raycroft and Rick DiPietro are as bad as you remember. Though Jonas Gustavsson still collects an NHL paycheck despite being just as bad.
The first obvious question to ask about the table above is how stable these save percentage distributions are. I’ll give the short answer here and leave the long answer as a footnote. Short answer: they’re reasonably stable, though the threshold for “reasonably” is entirely subjective. Long answer:** Which shouldn’t be all that surprising: again, the margin separating NHL goalies in terms of Sv% is pretty thin, so even randomly-generated distributions of Sv% aren’t going to vary that much.
So, wrapping up:
- Although game-to-game variation in Sv% is limited by the quality of NHL goaltenders, it still varies a lot more than the standard deviation might suggest.
- Goalies that actually help a team win are rare.
- The distribution of single-game save percentages is fairly stable, though a prospective study of the subject is clearly superior to anything one can do with game log information.
* Quick thought experiment: say we’ve got a league-average goalie (Sv% = 0.912) who’s faced 10,000 career shots. Now say he gets lit up and gives up 5 goals on 15 shots one night before being pulled. His new Sv% is . . . still 0.912.
** My general approach was to bootstrap single-game Sv%s from the full dataset, and see whether the distribution of one set of games predicted that of another set. Specifically, I generated two sets of 1000 25-game samples, two sets of 1000 50-game samples, and two sets of 1000 100-game samples, and looked at whether one set’s distribution predicted the other. One tricky question is how you define “prediction”; what’s the threshold for determining whether one distribution is significantly different from another? The problem gets even worse when you’re bootstrapping the samples: by construction, the hypothesis you’re testing is false (i.e., you’re testing whether two samples from the same distribution are from the same distribution). So I took a different approach: I assumed each sample’s distribution would predict that of its comparator in aggregate, and instead counted up how frequently they differed according to some test statistic. In other words, you expect a goalie’s performance to have some consistency over time, but how often is it not consistent? My second problem: distributions of single-game Sv% are heavily skewed to the right, meaning that traditional statistics assuming symmetry (e.g., arithmetic means and standard deviations, and related tests like t-tests) will be misleading if applied to them. So I used Mann-Whitney U tests (a non-parametric method not reliant on symmetry) rather than t-tests. In the 25-game samples, the U statistics suggested some difference (z-score <-1) in 32% of samples, and a big difference (z<-2) in 4% of samples. Results in the 50-game samples (28% and 4%, respectively) and the 100-game samples (34% and 4%, respectively) were similar.