NCAA Football Playoff

2014 is the first year for the four team NCAA football playoff. We are interested in identifying the four teams that will be invited to the playoff, the winner of which will be named the National Champion. We developed a tool that uses ranking Monte Carlo simulation and ranking methodologies to identify who the top four teams will be at the end of the season.

Here are more details on the methodology.

Consider the following two questions:

Will the Badgers beat the Gophers on November 29th?

Who will be the B1G champion this year?

On the surface both of the questions seem similar – the simple task of picking a team with the most likelihood of reigning supreme after a set of match ups.  We can approach these questions in similar ways, since prior to the games actually being played we know a bit of information about the teams involved in the competition.  Using this prior knowledge we can build, train, and evaluate models, and once we are confident with their performance we can then use the models to answer important questions – like “Are the Badgers going to beat the Gophers?” (yes*)

While the general approach of using a model is the same way to answer both of these questions the first question is significantly easier to answer.   In fact forecasting outcomes of games is relatively routine.  Many methods can be used to forecast the winner of a different types of sports games, and these methods often come with measures of how certain a win is or the point differential.  While a lot of the ideas behind single game forecasts carry over into whole season forecast one needs to introduce a couple more steps to the modeling process.

The whole season forecasting method can be thought of a series of single game forecasts, especially if you don’t know the games in advance (as in the case with conference championships).  That is, if you can effectively the forecast the outcome of games played in Week 8, by using the outcomes of previously played games, you can iteratively forecast Week 9, Week 10, … , all the way to the playoffs & bowl games.  Our models use this iterative approach and involve two closely related main steps: team ranking and game play simulation. There is a high degree of correlation between game outcomes in subsequent weeks

To accurately forecast the outcome of a game, we need to have access to the information we need to simulate a game that hasn’t occurred yet. In a simulation, this means that our method can’t rely on data that we cannot collect weeks before a game or generate ourselves on the fly. Team ratings along with game location are the major factors used in forecasting each game’s outcome.  In order to effectively iterate weekly forecasting we needed to use a ranking method that could be computed based on both played games and simulated games.  This limited us from using rankings that include factors like team health or expert opinion as such things can’t be simulated effectively. We used a simple Markov chain model model for rating and ranking the teams. The Markov chain model is a ranking method similar to the PageRank algorithm developed by Google founders L. Page and S. Brin.  The model creates a “voting” system, where for every game previously played the team that loses votes for the team that won. We use logistic regression to determine how much of a win/vote a team can take credit for based on the location of the game (home/away/neutral) and the score differential. We estimate the limiting distribution of the Markov chain, which is then used as the relative rating for every team. These ratings are sorted to find the rankings; however, the ratings themselves are used to forecast which teams will win in games not yet played (again using logistic regression).

We consider two model variants:

  1. A modified Logistic Regression Markov Chain model (mLRMC), and
  2. A ln modified Logistic Regression Markov Chain model (mLRMC) that takes the natural log of point differentials (ln(mLRMC)).

We also compare mLRMC and ln(mLRMC) to a method that uses the Colley Rankings to rate and rank the teams. The Colley ranking method is simpler than LRMC in that it only considers a teams’ wins and losses and strength of schedule, not score differentials. Additionally, the Colley rankings perform significantly better in our forecasting model than other ranking methods.

There are several advantages of this modeling approach:

  1. It takes network structure of the match ups into account to account for strength of schedule in a mathematically consistent way.
  2. It uses limited data that we can collect for future games in a simulation (e.g., useful point spreads are not available for games more than a week in advance).
  3. It automatically ranks the teams so that we can collect the top four teams in a large number of simulations without needing a human in the loop.
  4. It is not subject to human biases. For example, a team typically cannot decrease in the rankings except for when the lose. This controversial issue came up in October 2014 when Florida State dropped from the number 1 spot without losing a game. From a mathematical point this makes sense, since the quality of a team’s wins changes week to week, and therefore, we should not have to restrict the rankings from week to week.
  5. Most importantly, the polls like the AP and USA Today give the ranking right now, which only gives insights into who would be in a playoff if it were held today. Our method simulates the rest of the season to forecast who might be in the playoff at the end of the season by taking the remaining schedule (and strength of schedule) into account.

The methodology yields a set of ratings for the teams that are then ranked to provide the rankings. There are several key assumptions to the modeling approach:

  1. Ratings (and ultimately rankings) depend on the game outcomes (win/loss), strength of schedule, score differentials, and home/away status.
  2. Rankings do not depend on other factors (spread, yards, etc.)
  3. The game by game win probabilities are based on the two teams’ relative ratings and home field advantage.
  4. A team’s rating (and ultimately its ranking) are a factor of who it beats and who it loses to as well as their ratings (strength of schedule).

Since these rankings are generated prior to every game simulated they are used to help forecast the outcome of the game.  The two teams rankings are used to create a win/lose ratio that is weighted in favor of the team playing at home.  This ratio is then used to flip a “weighted” coin to determine the outcome of the game.

After all the games for a certain week are simulated, the simulated game outcomes from the week are then added to the voting system, and the rankings are recomputed. The process is repeated to simulate the outcomes of the season, resulting in a single playoff (one “experiment”).  While the outcome of this one experiment are interesting, its probably not representative of what will actually happen in real life.  We run the simulation a large number of times (10,000 in this case) to draw a more accurate picture of the season. As more and more actual games are recorded, we become more certain about what the playoff looks like.

Repeatedly simulating the season gives us the ability to draw insights like saying who will be the most likely B1G champion and what probability that team has of being the champion and then making it into the playoff.

There are a few limitations.

  1. The graph is sparse due to so many games within a conference and with so few games played as compared to other sports. It is really hard to accurately forecast the last 7 weeks of the season with just 8 weeks of games.
  2. The method only takes basic information like wins and losses into account, not degree of win, yards allowed, or other information into account. With more information, we can more accurately forecast game outcomes ahead of time.
  3. The method is not subject to human evaluation of game outcomes that are not captured by the numbers, such as controversial game endings.


2014 methodology

Our first model debuted in 2014. We used a relatively simple but effective ranking method based on the PageRank algorithm. This modified version of PageRank creates a “voting” system, where for every game previously played the team that loses votes for the team that won.  This system can then be analyzed to find the relative ranking of every team. Teams are assigned a rating based on their wins (votes) as well as the quality of those wins (the vote from beating a great team counts for more than the vote from beating a poor team).

Fainmesser, I., Fershtman, C., & Gandal, N. (2009). A consistent weighted ranking scheme with an application to NCAA college football rankings. Journal of Sports Economics.

There are three versions of the model that allow for increasing levels of complexity:

  1. The wins only model that allows for home wins to count for more than away wins and does not account for who a team loses to.
  2. The wins and losses model that takes wins and losses into account but counts each win (and each loss) equally.
  3. The full model that combines the features of the wins only model and the wins and losses model to allow for wins, losses, and home field impact.

We believed we could improve upon this model, and we did so in our 2015 method using Markov chains.


* The Wisconsin-Minnesota goes way back to 1890. It is the longest played rivalry in NCAA football. The teams have played for the trophy “Paul Bunyan’s Ax” since 1948 (before that they played for the “Slab of Bacon” trophy).