At times I wonder if ratings are like what they say about sausage, that it's better to not see them getting made. The process is one of gradual fine-tuning, and I hope that my openness about it has at least been interesting for some.
The final ratings for the pre-first round bid period are now posted. There is one fairly significant change to the way that the ratings are calculated, which should address the concerns that some (including myself) have addressed that the ratings don't adequately evaluate elimination rounds. I'm not going to go into a ton of depth (districts prep weighing down), but a short explanation should at least cover the big picture.
Previously, elim results were tabulated as a percentage of the ballot count of the panel. Thus, a win on a 3-0 would be 100% (or 1), a win on a 2-1 would be 67% (or .67), etc. This seemed logical and fair for a couple of reasons. First, a 3-0 is a stronger win than a 2-1. Second, it seemed like the loser of a 2-1 should get some credit.
However, based on an analysis of last year's data (and some eyeballing of this year's ratings), this solution isn't the most accurate of options. For those who have some stats background, I tried mapping ballot counts onto a log function, which would give a boost for the win, and gradually even out with more disparate ballot counts. That worked better, but what I ultimately found out is that the most accurate predictions came from the ratings where elims were treated in the same binary manner that prelims are. In other words, the ratings are most accurate when specific ballot counts are disregarded and the results just counted as a win or loss. The new ratings reflect this change.
How to deal with elims will be a continuing issue that I will look at down the road. It is something that might require creative solutions and will certainly involve lots of testing.
A couple of other notes:
1. The new ratings exclude any team that hasn't been together for at least 3 tournaments as a unit.
2. The ratings do include all of the round robins, but to minimize some distorting effects of the early round robins, I made the ratings pretend that they occurred after the regular Weber & Kentucky tournaments.
3. Strictly speaking, these ratings are not intended to be predictions of first round voting. However, it will be interesting to see how well they match up. My impression is that the most recent update will be more in line with voter tendencies than the previous version.
4. There's a problem with KSU's Klucas & Scott that I need to fix, but it will have to wait a few days.
At the moment (the morning of 02/11/2015), I've only posted the top 50 on the unweighted ratings page. I wanted to at least get these out there, but I have a few things that I need to do before I post the final set before the first round voting happens. I'm growing increasingly convinced that my method has undervalued the outcome of elim rounds, and I have been working on a method that resolves this issue (or at least improves it).
I did take out teams that I was pretty confident were defunct to give a cleaner picture of the rankings.
A few comments:
1. The biggest jumps from the previous ratings came from Michigan State ST, Oklahoma CY, and George Mason KL. MSU won the Texas tournament, and OU had a ridiculous prelim run (taking down #4, #6, #20, #28, and #41). GMU won the Pitt RR and had a very good Texas.
2. There is a clear difference between the top 4 and everybody else. The dropoff between 4 and 5 is almost as large as the dropoff from 5 to 10.
3. The spread at the first round borderline is tight. The difference between #14 and #19 is only 39 points. To put that in perspective, #14 would be less than a 5:4 (54%) favorite over #19.
4. The Secret Word for today is "recency." Northwestern barely held onto its spot despite having what most would consider to be the best overall performance of the season. Now, some of this perception is because people (rightly or wrongly) put a lot of extra stock in elim success as its own independent criteria. Glicko ratings don't view winning a tournament as anything more than the sum of the individual rounds that the team won and lost. You don't get any bonus. The only thing that matters are the head-to-heads. As I mentioned above, it's possible that I'm undervaluing elims due to the way that I use the fraction of the ballot count as a percentage win (a 3-2 = .6 win), but even with a fix to this, the calculation will always only be the head-to-heads.
The reason Northwestern barely held onto their spot is the role that the recency of results plays in the calculation. The season is divided into rating periods. The calculation of ratings at the end of each period is an adjustment of the rating of the previous period. The amount of change is based on how well the team performed against their expected performance, weighted by the size of their deviance. Since deviance goes down as a team gets more rounds, the amount of fluctuation in the ratings also goes down. However, since the most recent rating is an adjustment of the previous rating, the calculation gives more weight to recent results.
This becomes apparent with the new ratings with the rise of Michigan AP and the fall of Northwestern MV. Michigan had an outstanding Dartmouth RR and a good Texas tournament. Northwestern had a disastrous Dartmouth and somewhat disappointing Texas. They more than doubled their loss total on the season at Dartmouth.
Correcting for the undervaluing of elims mentioned may be enough to extend Northwestern's lead back to where it was, but recency will always be a factor in a ratings system like glicko. In fact, it is a strength of the system because we know that debaters evolve over the course of the season. It is possible to adjust variables so that new results are weighted more or less. This is something that I've been looking at, but there's not a ton of play. The decision on the "best" value will be determined by which is most likely to give more accurate predictions.
5. The possibility that round robins could be negatively distorting the ratings still needs to be considered. It's possible that very good teams that aren't invited to a round robin could be benefiting relative to the teams that are invited but do poorly. I'm merely speculating at this point, but it could be an explanation for those who think that Michigan State ST or Georgetown LM are rated too low (or that Harvard DH or Michigan State CZ are rated to high).
I continue to strongly believe that round robins should not be "no risk" propositions because it just doesn't make sense to say that a team's wins matter but their losses don't. I worked out the results from last year, but this is an issue that just needs more data to grow more confident about.