We can fit the data perfectly, but we shouldn’t. This is what’s known as the bias-variance trade-off. The more precise we try to be in fitting the data at hand, the more likely we memorize effects that are due to random variation in the training data that do not generalize to other datasets.
I love lists, and I love arguing, so here are five players who should be on the top 100, and who they could take off.
All this talk about the NHL's Greatest 100 got me thinking about what goes into hockey greatness. I always lamented how there wasn't a Hockey Hall of Fame model like basketball-reference's model. So I thought I would go about predicting both things.
There are two primary challenges we face when judging goalies: The number of goals, especially as we try to account for confounding factors, is relatively small, and we have trouble isolating goalie performance from team effects
If we wanted to test the impact of unemployment and expectations on inflation, we could study data from a regime where inflation is not a goal of policymakers.
Some data rarely changes, such as old boxscores, while some data changes every day, such as game results, and other data may change every time I view it, such as gamelogs for a game in progress. I want to cache my data, but with a concept of staleness and control over data expiry.
Where will the Warriors land? Will the Heat flame out? Are the Lakers back?!?
sqloose is a little bit of syntactic sugar to make SQL coding easier. I’ve added ranges and negative indices to GROUP BY and ORDER BY clauses, and added GROUP TO and GROUP THROUGH clauses to make it even easier to write SQL queries.
Predicting goaltender performance is hard. Goalies are subject to the whims of the gods and the players in front of them, whichever is least merciful.
So there you have it, some no-holds-barred nothing-to-hide projections. Full tables are in my github link. I'll use them as a starting point, but by no means an ending point, on my opinions and pool draft this year.