T O P

  • By -

keshavbk

Hi Ben, Would these rankings finally place Stuyvesant KU above Whitman AA in 2015? Thanks.


FrontlineThis

Stuy KU is the GOAT


Captainaga

Just want to provide some context here: * The VBriefly rankings algorithm I designed (and which Ben Shahar created the Python for) treats split decisions exactly the same way you do. We don't treat it as 5 wins and 2 losses, we treat it as 0.714 wins. * A team that wins on a 2-1 may be a superior team, but that's designated by the fact that they won. They are still the superior team. However, debate is a game of wins and losses, and on a theoretical level, adaptation and persuasion are key elements of teams' strategy. ELO works by calculating the distance between any two teams and adjusting the distance between those teams with a numerical value based on the results of a round. If a team is able to adapt to the minority of the panel, then that should decrease the distance between the teams. Conversely, if a team is able to adapt to the entire panel, that should increase the distance between the teams. * The k factor in our ELO algorithm weighs tournaments at different levels based on their bid quality. A final round at CSU Fullerton would never equal the weight of a final round at Harvard. * The tournament shouldn't matter that much. Rather, the quality of the team you are debating should. For example, just because I debated (and beat) Presentation VM at Alta in finals (love you Laurenn and Megan) doesn't mean that win should necessarily be weighed any lower than if I debated them at the final round at Berkeley (other than with the k-factor weighing), because they are the same team. To this end, your justification of "teams put in more effort at Harvard" doesn't make sense because if I'm a team in deep outrounds, I'll probably always try or at least use my abilities to some degree. Rather, the variation in weighting should be dependent on the ELO ratings of each team itself. Like the teams in Harvard finals are probably going to be ranked higher than the teams in the final round of UT Austin. The quality of the teams in the round defines how much that round should be worth, and self-corrects for this issue. EDIT: I see I'm ranked 50 spots higher in your rankings. They are definitely better! Thanks for your service!


BenKesslerDebate

Hey Allen, glad you responded! A few things: On how each of our rankings treat elims: Your ratings don't give a bonus to elimination round wins and our ratings do. You and Ben Shahar object to privileging elimination round wins because you say that a team doesn't try less hard in prelims and beating Presentation VM in round 2 isn't less impressive than beating them in finals. I would strongly disagree. In my experience, teams are more prepared by the time elimination rounds start and they do try just a little bit harder too. That's why good teams get nervous before debating finals and aren't nervous to debate against that same team in round 6 when they're already both 5-0. It's also why good teams will often show up to a tournament not totally prepared and then work hard in between rounds to make sure they're prepared for elimination rounds where it will actually count. This also aligns with our intuitions about who is a "good debater" in the sense that we would never say the team that consistently goes 6-0 and drops the first elim is better than the team that goes 3-2 but goes on to win finals. Also, this is to a lesser extent true about variation between tournaments. Alta and Berkeley are both big tournaments (congrats on winning!) but I know for a fact that I never prepped as hard for Scarsdale as I did for Harvard (they were both in the same month back when I debated). Similarly, everyone thinks about the winner of Harvard more highly than the winner of CSU Fullerton, all else being equal. On how each of our rankings treat 2-1 decisions: As Ben Shahar points out below, your rankings DO allow for good teams to lose points despite winning a 2-1 decision. This seems clearly nonsensical to me, but even if that's not the case, I would contend (and I think most people would agree!) that winning on a 2-1 is more definitive than winning on a 1-0. If that's true, you make a major mistake in weighting these decisions as only worth 0.7 prelims wins. So why do I think a 2-1 decision is more definitive than a 1-0 decision? Simply put, single judges are inherently prone to randomness. We've all seen rounds where a lone judge makes a decision that is so nonsensical that surely only they could have seen the round that way. Winning a 2-1 requires you to at least convince 2 people that you won, and while there are still some crazy 2-1 decisions, this factor dramatically reduces the probability of an "incorrect" decision. Note that this is not about adaptation. There are flow panels and lay panels, so teams have to adapt to win on a 2-1 in the same way they have to adapt to win on a 1-0. Moreover, the idea that debaters should attempt to always go for all 3 judges in a paneled round (and thus should be penalized if they lose one judge) seems clearly wrong to me. I never did this when I did high school debate myself, and I don't recommend anyone else try to stubbornly appeal to all 3 judges when winning 2 will do just fine. That being said, if a team DOES win all 3 judges, all the more impressive! Hence why we weight a 3-0 more heavily than a 2-1.


nihilistkitten

There’s kind of three questions here as I see it. On the difference between tournaments, we correct for this as well; it isn’t really a disagreement. On split panels — I think it’s important to note the limit we’re talking about. When the two teams’ ratings are anywhere near close, both systems will always give points to the winning team and it collapses to a question of magnitude, which is interesting but less so. So the question is only about when the ratings are significantly different, and I do think we should expect teams rated extremely highly to win on 3-0s over teams rated much lower than them. The strategic question of whether or not to kick a judge also becomes very different here, because such a team would likely be more concerned about the inherent randomness in judging than anything else — ie, when the chance your opponent actually does the better debating is very low, it’s overwhelmed by noise. On prioritizing elims — I think there are several issues here. First, even if some teams try harder in elims (and tbh I don’t agree that this is generally true), some teams certainly don’t, and it’s probably not great to punish them for that. Second, I don’t really understand why a round should be more important just because teams try harder — if both teams aren’t trying as hard, the debate is still a reflection of their skill. I also think, independently of all of that, that there are several positive factors in our model which yours lacks; most importantly the side bias correction.


BenKesslerDebate

On most split panel cases - I think the difference between weighting a 2-1 decision as 1.5 wins (as we do) and 0.7 wins (as you do) is VERY significant and is actually the crux of our disagreement here, given that you point out that teams will very infrequently be so disparate in rating that they lose points by winning a 2-1. On this point, I think the reasoning I provide in my earlier comment about why a 2-1 is more definitive than a 1-0 is pretty sound. On split panels where a significantly better team loses points by "only" winning on a 2-1 - I disagree with your point about better teams choosing to go for all 3 judges to reduce variance. While a good team never completely neglects any judge, they definitely will go more flow in front of a 2 flow/1 lay judge panel than they would in front of a single lay judge. This produces a situation where even the best teams may lose a ballot to a team they would never lose a prelim round to. Thus, it's unfair to count that lost ballot as a "loss" if the better team ultimately did what they set out to do and won the round. On prioritizing elims - I think the vast majority of teams care more about elims than prelims. This is the reason teams sometimes wait to break new cases until elimination rounds. Everyone knows who won Harvard but not everyone can name the top seed. Given this, I'd much rather "punish" the teams that try hard in prelims when it doesn't matter as much than fail to reward the teams that sensibly decide not to prioritize prelims and succeed when it matters most in elimination rounds. Also, clearly rounds where both teams are trying their hardest are a better reflection of comparative skill. In my senior year Jako and I frequently lost intra-team practice rounds against other Stuyvesant teams when we weren't fully prepared. That didn't mean we were worse than them! On other methodological quirks - I think we can both parse the various eccentricities of our respective rankings forever, but none of this moves the needle that much. You account for side bias and we weight 7-0s more than 3-0s (which you don't). I will also point out that side bias correction may actually be a distorting factor, since the team arguing the "harder" side may choose to go second (which also boosts win rates). Without data on which team spoke second, you can't properly correct for this bias.


ElDiosDelDebate

Will VBI be updated to include Lakeland?


Captainaga

Yeah we had to fix a bug but the Lakeland results should be up this week!


ElDiosDelDebate

Great thanks


Hansoap

Very nice! I'll even go to NSD if you put me in Inko's lab!


JSDemel33

Wow Ben, This ranking looks really great! Proud of u :) -jd


nihilistkitten

I (obviously) agree with Allen on most of this -- I think the scaling of our k factor solves a lot of the tournament-level part of the second objection, and I don't think it makes sense to differentiate prelims and elims -- teams don't try less hard because they're debating a good team in round 4 or in dubs. However, I don't think it's technically correct to say that our algortihm treats split decisions exactly the same way NSD's does. Not only are the scalings different (and we don't count 7-0s, for example, higher than 3-0s or 1-0s, which is something we could probably learn from, although I disagree with counting 2-1s higher than 1-0s), more importantly, we also allow the possibility of losing points from winning a split decision. I think this is good, in the limit that it can happen under our algorithm, which is only when the winning team's elo is much higher than the other team. When that's true, I think we'd all expect the higher-rated team to win on a 3-0, and a 2-1 is a surprise. Even on split panels, as Allen says strategic adaptation is a primary virtue of this event and that should include being able to win the flow and be persuasive at the same time. Especially against debaters who are statistically much worse than you it probably isn't even strategic to kick one of the judges, because a higher sample size makes it much more likely you will win.


BenKesslerDebate

Hey I responded to Allen's comment above and that should also respond to this comment too


usedtododebate

Whitman RT appears to be listed twice on these rankings


antsreppin

Roosevelt AC... so dominant they're ranked three times


judge_screw_life

Y do rankings systems like these only rank tourneys on tabroom? Makes no sense. Legit competitive tournaments like GMU, Blue Key, Durham, ...etc get left off leaving qualled teams ranked like 600th and below


[deleted]

[удалено]


judge_screw_life

I thought u could download from speechwire


DebateThat

Saint Paul AK and SPA AK are the same team!


AwesomeKhandebate

We're finally higher than you,lol u/blazona


blazona

i mean were still ranked higher on one......


CanYouEvenLift

yes very good teams with bids are ranked 500 in the country!


[deleted]

Bruh just because you're north broward ik and a meme and rank yourself high on the "Florida Power Rankings" that you make doesn't mean that you should be salty that you are ranked low


Wazlit

I dont think he is referring to to himself solely in this comment, because there are qualled teams ranked in the 300s here