Sports analytics from Laura Albert's team at the University of Wisconsin-Madison College of Engineering

final thoughts on our NCAA football playoff forecasting model

Here are some final thoughts on our model and methodology. First, the pros.

  1. Our models worked really well considering how little information they use. Really well. I thought 12-13 games would not be enough to provide a reasonable forecast, but we ended up being very close to the committee rankings.
  2. All three versions of the model converged on the same results, which wasn’t too surprising given that they all use the same underlying network to gauge quality. But not having agreement would have pointed to a problem.

Now the cons.

  1. There is always that team that clearly doesn’t belong in the rankings, especially early on (I’m looking at you, Minnesota!)
  2. I would want to use additional information next year to instead give teams a fraction of a win based on the point differential in played games. This could help connect our network and provide more refined forecasts.
  3. We don’t use any human and/or crowdsourced information, like point spreads or some of the polls. This can help make the most sense of games that have already been played. There is no need to entirely rely on math models as we did this past year, but I do want to give the math to add value over more traditional models that only rely on human evaluation (like the polls).
  4. It looks like we will need to do some manual tweaking of the results to account for teams from non-major conferences like Marshall. We weren’t sure what to do with them, but it seems like a team like Boise State or Northern Illinois could get invited so long as their schedule passes the sniff test for difficulty (Marshall’s did not) and the team goes undefeated.
  5. We need a better testing and validation plan for selecting model parameters. What we did didn’t work well. We tried to choose the parameters that forecasted the most games “correctly,” but ultimately this ended up yielding nonsensical parameters. In the paper we used, the authors selected parameters based on predicting the outcomes of the most bowl games, and the parameters were pretty nonsensical. For example, home games counted for 3x more than away games, meaning that a team with 6 away wins would be ranked about the same as another team with 2 home wins. I selected parameters that made the most sense and “worked.” For example, home wins/losses counted 15-20% more than away wins/losses to account for the higher expectations associated with home field advantage. But I think we can improve this.

In sum, it sounds like there are a few directions to improve the basic model. We will try to clean up our code and post it in the relatively near future.

Nate Silver wrote a nice post summarizing final thoughts for FiveThirtyEight’s college football model. Their model was based on the previous week’s polls, which injected the human element into the rankings. And this introduced a bias into the rankings. Michael Lopez has a nice post on voter bias and how this affects rankings. I’m not saying that using information from humans shouldn’t play a role, but think math deserves some credit.