At the moment (the morning of 02/11/2015), I've only posted the top 50 on the unweighted ratings page. I wanted to at least get these out there, but I have a few things that I need to do before I post the final set before the first round voting happens. I'm growing increasingly convinced that my method has undervalued the outcome of elim rounds, and I have been working on a method that resolves this issue (or at least improves it).
I did take out teams that I was pretty confident were defunct to give a cleaner picture of the rankings.
A few comments:
1. The biggest jumps from the previous ratings came from Michigan State ST, Oklahoma CY, and George Mason KL. MSU won the Texas tournament, and OU had a ridiculous prelim run (taking down #4, #6, #20, #28, and #41). GMU won the Pitt RR and had a very good Texas.
2. There is a clear difference between the top 4 and everybody else. The dropoff between 4 and 5 is almost as large as the dropoff from 5 to 10.
3. The spread at the first round borderline is tight. The difference between #14 and #19 is only 39 points. To put that in perspective, #14 would be less than a 5:4 (54%) favorite over #19.
4. The Secret Word for today is "recency." Northwestern barely held onto its spot despite having what most would consider to be the best overall performance of the season. Now, some of this perception is because people (rightly or wrongly) put a lot of extra stock in elim success as its own independent criteria. Glicko ratings don't view winning a tournament as anything more than the sum of the individual rounds that the team won and lost. You don't get any bonus. The only thing that matters are the head-to-heads. As I mentioned above, it's possible that I'm undervaluing elims due to the way that I use the fraction of the ballot count as a percentage win (a 3-2 = .6 win), but even with a fix to this, the calculation will always only be the head-to-heads.
The reason Northwestern barely held onto their spot is the role that the recency of results plays in the calculation. The season is divided into rating periods. The calculation of ratings at the end of each period is an adjustment of the rating of the previous period. The amount of change is based on how well the team performed against their expected performance, weighted by the size of their deviance. Since deviance goes down as a team gets more rounds, the amount of fluctuation in the ratings also goes down. However, since the most recent rating is an adjustment of the previous rating, the calculation gives more weight to recent results.
This becomes apparent with the new ratings with the rise of Michigan AP and the fall of Northwestern MV. Michigan had an outstanding Dartmouth RR and a good Texas tournament. Northwestern had a disastrous Dartmouth and somewhat disappointing Texas. They more than doubled their loss total on the season at Dartmouth.
Correcting for the undervaluing of elims mentioned may be enough to extend Northwestern's lead back to where it was, but recency will always be a factor in a ratings system like glicko. In fact, it is a strength of the system because we know that debaters evolve over the course of the season. It is possible to adjust variables so that new results are weighted more or less. This is something that I've been looking at, but there's not a ton of play. The decision on the "best" value will be determined by which is most likely to give more accurate predictions.
5. The possibility that round robins could be negatively distorting the ratings still needs to be considered. It's possible that very good teams that aren't invited to a round robin could be benefiting relative to the teams that are invited but do poorly. I'm merely speculating at this point, but it could be an explanation for those who think that Michigan State ST or Georgetown LM are rated too low (or that Harvard DH or Michigan State CZ are rated to high).
I continue to strongly believe that round robins should not be "no risk" propositions because it just doesn't make sense to say that a team's wins matter but their losses don't. I worked out the results from last year, but this is an issue that just needs more data to grow more confident about.