Image is If I was an alien, visiting Earth, I'd disguise myself as a cactus too. by Carl Jones used under license CC BY-NC-ND
I’ve decided to upload my projections for NHL skaters and goalies this year. These are derived from basic statistical models. They cover many players and a variety of stats, but of course come with some caveats. Mainly, they’re wrong.
There aren’t many sources that share NHL projections. The ones that do suffer from a few limitations:
- They often cover limited statistics, not always covering blocks, hits, or faceoffs
- They might be shared in a presentable format (e.g. a pretty website), but not parser-friendly (e.g. CSV)
- They are often opaque about their methodology. I don’t know if they use models or how they arrive at their conclusions
- Many charge a fee
Of course, mine also suffer from their own set of limitations, so I’ll share a few links to others.
I haven’t evaluated any of those quantitatively, nor purchased the ones with fees (not that I think fees are unfair, $5 isn’t a lot of money for someone’s time). So this isn’t an endorsement. But I would hope that the effort level plus domain knowledge of those writers would result in better predictions than my lazy stats.
These formed the basis of my fantasy draft last year, and the methodology is mostly unchanged from how I described it previously. I use LASSO models with lags, averages, position, and recent experience. What this means is that any given stat prediction ends up (mostly) being a combination of last year’s value for that player, and their career average. There’s not much to it.
This is data in, data out. No manual review.
Some of the things my models will do a poor job of:
- It doesn’t (directly) know about injuries
- It doesn’t know about trades, of players or the players they will play with
- It doesn’t know about movements up and down the depth charts
- For that matter, it doesn’t know about retirement
Those are harder things to put into the model, because they require a lot of manual data creation. Not just for this year, but imagine going back and filling out 15 years of data. Not with after the fact knowledge, but what we knew as of September in each of those years.
Here are the top 20 players, ranking by one of the metrics I described in that earlier piece. My predictions come with decimal precision, because round numbers are for jocks.
Yeah, I didn’t expect to see Dubinsky in there either.
These are a little different and a touch more complicated than my skater models. I used a slightly more thorough feature set, and instead of LASSO I used gradient tree boosting. I’m planning on writing a follow-up piece with more details on that.
I couldn’t agree more.
I have an academic background in economics (I know, I know…) and I’ve heard the following nerdy joke a few times.
Three econometricians went out hunting, and came across a large deer. The first econometrician fired, but missed, by a meter to the left. The second econometrician fired, but also missed, by a meter to the right. The third econometrician didn’t fire, but shouted in triumph, “We got it! We got it!”
What happens when you predict something that has a lot of noise in it? You (using most common loss functions) end up with predictions that are highly regressed to the league average. Because that actually makes sense. It’s totally possible that Braden Holtby puts up even better numbers next year than his Vezina year last season, but it’s also totally possible that his save percentage and goals against average drop 15 points and jump by 0.3, respectively. So we end up with something in between.
So in the end, we end up no extreme values in the predictions. Last year, 8 goalies got 35 or more wins. The highest in my predictions? 34 (Braden Holtby). 8 goalies (minimum 10 games) ended up with a GAA below 2.20. Lowest in my dataset? 2.32 (Carey Price). For save percentage, that most fluctuating of major stats, the highest I have is 0.918 (Matt Murray). A third of the league did better than that last year!
So while I know many goalies will put up better numbers (or worse numbers) than my extremes, for no one goalie do I predict extreme values.
These also suffer from the same issues as the skater projections, which is probaby more damaging in the case of goalies. The models don’t know who the starters and the reserves are. They aren’t constrained to make them consistent by team. You can manually scale the wins and shut-out projections to match the number of games you expect.
Without further ado, here’s 20.
So there you have it, some no-holds-barred nothing-to-hide projections. Full tables are in my github link. I’ll use them as a starting point, but by no means an ending point, on my opinions and pool draft this year.
Remember, they’re horribly wrong.