Deep Analysis - What's Wrong With Being Results-Oriented?

Eeny, meeny, miny, moe
Catch a tiger by his toe
If he hollers, let him go
Eeny, meeny, miny, moe

The other day, I was trying to decide where to eat with a friend. We could not come to a consensus, so the friend suggested Eeny-meeny-miny-moe.

For those unfamiliar with this process, children often use it to decide something “randomly,” such as determining who will have to go get the ball that went over the fence. You start by pointing to one person, then speak the first beat of the rhyme (“eeny”) as you point to the next person in line. As you progress through the rhyme, you keep pointing to the next successive person as you speak each beat (lapping back around as necessary), and whoever you are pointing to when the rhyme ends is the “randomly chosen” candidate. Naturally, you can also use this to decide between objects or restaurant choices instead of people.

While a quick way to settle things as a child, this is hardly a fair way for two adults to come to a consensus.

“It’s deterministic,” I told my friend, and it was the truth.

Eeny-meeny-miny-moe is not a truly random way to select something because the outcome is entirely predictable. If it’s just me and Joe in a room, and I want to “randomly select” one of us (but I really want it to be me), I’ll offer to perform Eeny-meeny-miny-moe every time! As long as I start the rhyme pointing at Joe, I will end up being selected 100% of the time. I can determine the outcome ahead of time as long as I know the starting state; it is deterministic.

Unlike eeny-meeny-miny-moe, a Magic tournament is not deterministic. (Thanks, Heisenberg Uncertainty Principle!)

You can start a tournament off with the same players, running the same decks, with the same first-round pairings, and run that setup fifty times over (heck, even resetting the players’ memories in between tournaments, just to be safe) and get fifty different outcomes. Fifty different champions, fifty entirely different Top 8s, all the way down to fifty different people in last place – it really is impossible to determine how the tournament will play out ahead of time. There are too many factors that we can possibly know.

A simple question: Did Luis Scott-Vargas have the best decklist at Pro Tour: Berlin? After all, he did win the thing.

This is at the heart of the “results-oriented” discussion that has spiked in popularity of late. Something that has inexplicably not been brought up is the fact that everyone is, to some degree or other, results-oriented.

At one end of the spectrum, you have people who insist that LSV had the best build, because he won. This might be a good argument if Magic were deterministic, but it is not. If we ran fifty reboots of PT: Berlin as described above, it would become instantly obvious that being 100% results-oriented will lead you astray. LSV wins the first one and Captain Results declares “he had the best list!” Then Saito wins the next one and Captain Results declares “Saito had the best list!” Then Thaler wins the one after that and the good Captain has a third answer for us.

Clearly there can be only one Best List in the Room, and the contestants for that title are locked in when Registration closes. If you look at fifty tournaments with identical decklist pools and come up with fifty different answers as to which list was the best, your method is flat-out awful.

So being 100% results-oriented is right out.

What about 0% results-oriented? This would lead you to believe that Joe Shmo and his sixty-Swamp deck, tied for last place, has just as good a chance at having the Best List as LSV did. If you have time to test out every single decklist in the tournament for yourself, this is actually okay – but, let’s face it, no one has that kind of time. Not even Frank Karsten.

Really, you want to be somewhat results-oriented – just enough to filter out the lists that are not worth your time, without discounting any legitimate contenders. That’s the sweet spot.

The Inconvenience of Luck, The Importance of Testing

Let’s say you tend to fall on the side of “very results oriented,” and believe that the higher the finish, the better the deck – unless there’s a gap in playskill that might account for the difference in finish instead.

At Pro Tour: Berlin, Martin Juza finished higher in the Top 8 (and in the Swiss, by a full match win) than Jan Doise. According to a highly results-oriented evaluation, Jan must have done something better to earn that finish. As Jan has had a much longer history of PT success, it seems unlikely that Jan has materially less playskill (God Forbid he should have more playskill), so the only results-oriented explanation for Juza’s higher finish is that he had a better Elves! list.

How do you reconcile that notion with the fact that Juza only played 2 Summoner’s Pact, a top contender for MVP of the deck according to successful Elves! players across the globe? Jan certainly had the full four.

There are only two ways to reconcile this. One is to make the bold claim that, in Juza’s build – but not in any of the builds that finished below him with 4x Pact, including Jan’s – it was correct to play two Summoner’s Pact, or that all the other decks get other things so gravely wrong that they outweigh the massively suboptimal Pact count. As it flies in the face of mountains of testing data, this is also known as the “lalalala, can’t hear you” approach to Magic.

The other way to reconcile it is to take a deep breath and accept that luck is a part of Magic.

The harsh reality is that the outcome of a Magic tournament is determined not only by skill and by deck, but also by luck. This is incredibly inconvenient for the 100% results-oriented among us.

After all, you can certainly look at a Top 8 and adjust for playskill based on how well you know the players (leaving the rest to be explained by deck quality), but how do you adjust for luck? If a player had good draws and wins, do you value that win less? What if he would have won even with bad draws? Very, very inconvenient.

Inconvenience, however, is no excuse for burying your head in the sand and pretending luck does not influence finishes. It’s still there, and all you’re doing by ignoring it is making worse decisions.

Since luck provides a large degree of uncertainty to tournament results that you cannot avoid, you must inevitably turn to testing to get your answers. Once you’ve filtered down to your set of contenders – be it the first place finish, the top two decks, the top four, the top eight, the top sixteen, or whatever else – the only true way to decide which is best is cold, hard testing. Either test the decks out yourself, or, if you trust someone else to do a better job, follow their experience in testing instead.

Good testing is a better predictor of deck quality than tournament results for one simple reason: repetitions.

Since Magic is not deterministic, there are tons of factors that lead to the outcome of a tournament. In testing, you surgically remove a lot of those factors, so that you can drill down and just repeat the one scenario you care about – deck X versus deck Y – over and over. The more repetitions you play, the lower the chance that Magic’s inherent randomness will skew your conclusions.

Through repeated testing, you are helping to eliminate that inconvenient luck factor so that you can look at the results at face value, adjust for your playskill relative to your partner’s, and evaluate the strength of the deck based on its performance in the context of each player’s playskill. You can never eliminate luck entirely, but you sure can shrink its impact down a lot further than the level you see at a single tournament.

Now for the bigger question: where do you draw the line on tournament results? If being 100% results-oriented is wrong and 0% is also wrong, where is that sweet spot?

The Partner in Crime Factor

Consider the Top 16 of PT: Berlin. We know that Elves! utterly dominated the Top 8, and many people use that as a proxy for saying “Elves! utterly dominated the tournament,” and a number of them follow through to conclude that “the sky is falling.”

Ninth place was Faeries featuring three maindeck copies of Glen Elendra Archmage, and one Azami, Lady of Scrolls. Moving down we have Goblins at tenth, then Fatestitcher Dredge with 4 maindeck Magus of the Moon and 4 maindeck Glimpse the Unthinkable, B/G Death Cloud with five maindeck Planeswalkers, Faeries again, mono-red Ritual-Blood Moon-Demigod-Empty the Warrens beatdown, Faeries again, and a second copy of the Fatestitcher/Magus/Glimpse Dredge deck.

Not a single Elves! player finished 9th-16th at PT: Berlin.

It made up 75% of the Top 8 – an overwhelming majority – but not even half of the Top 16.

In fact, if we look at the Top 16, Faeries did almost as well as Elves! did, accounting for 25% instead of 38%. The third-highest deck concentration actually came from Glimpse the Fatestitching Unthinkable Dredge, at 13%, and each of the remaining decks was tied for last with only one copy apiece.

Looking at the Top 16, it’s clear that both Faeries and Elves! had a much stronger showing than the rest of the field. We don’t think of this as a tournament dominated by “Faeries and Elves!” though, we only think of it as The Elves! Show, starring Elves! as Elves!

Are we right to think of it that way?

Why not consider the whole Top 16? Heck, from there, why not the whole Top 32? Top 64? Why not go as far as to count all Day 2 finishes? (And on and on we go, until we hit 0% results-oriented again.)

Actually, it’s not inconceivable that the Best List in the Room could turn out to be the one in last place. It could be that its pilot was saddled with the most heartbreakingly unlikely pairings for every round, starting from round 1, and did not win a single match despite playing the best deck in the entire tournament. Maybe he’d win the title in the next 49 reboots of the tournament, but this time around, a massive conspiracy of fate led him to last place instead.

Paul Cheon, LSV’s partner in crime, whose playskill is – if not identical to LSV’s, certainly very close – finished 244th at Berlin, playing the same deck that LSV took to first place. If that list was truly the best one at the tournament, you sure wouldn’t know it just by looking at Cheon’s finish.

This actually happens pretty regularly – two players, or maybe a group of them, all with similar levels of playskill, show up with identical decklists and get widely varying results. Zac and I tested for GP: Columbus exclusively with one another, brought the same 75 cards, and I ended up in the money while he was out within the first few rounds. It certainly wasn’t the decklist that caused the difference, and Zac is, if anything, a better player than I am; it was the difference in pairings that did us in.

Solo operators that are knocked out by unfortunate pairings do not have the benefit of their teammates’ decklists doing well to shed light on their quality build. Imagine if Mark Herberholz, who broke with his team at PT: Honolulu and built his Gruul deck the morning of the tournament, had suffered bad pairings early on and lost out? No one would have ever given “that terrible Giant Solifuge/Flames of the Blood Hand deck” a second thought. Instead, since he won that PT, it was the premier beatdown deck of that entire PTQ season.

Then again, among the 50 players above and below Cheon at PT: Berlin, maybe one or two of them actually had a list worth testing. After all, having the Best List in the Room means that the right pilot had the best chances of winning the tournament with it, which means that the further down the standings you go, the less likely it is that the player had a phenomenal list but just got incredibly bad beats. So while your “Top X or Better” filter might accidentally miss a few good lists like Cheon’s, the amount of time you’re saving by not crashing those 40-or-so weak builds against each other more than makes up for what you miss out on.

Top 8 and Tiebreakers

Most people draw the line for “decks worth considering” at Top 8. Is that a good idea?

At the end of the Swiss, eventual champion LSV had an identical record to Tomohiro Aridome and his Faeries list. Following the Swiss, neither LSV nor Aridome lost another game, but while LSV was given the chance to continue accumulating wins (as both players had been doing the entire tournament), Aridome was booted out of the event on tiebreakers and was denied the opportunity to try. For all we know, if their tiebreakers had been reversed, Aridome could have won the whole shebang instead – Azami, Lady of Scrolls and all.

Doesn’t it seem reasonable that someone with an identical Swiss record to a Top 8 competitor should be given the same level of respect as the 5th-8th place finishers?

After all, you don’t have much control over your opponents’ match win percentages. Sometimes you will defeat a fantastic player in Day 2, who then goes on tilt and ruins your tiebreakers. Early round opponents can stay in after falling out of contention for Day 2, and will scoop people in when they are paired up for the round. A low tiebreaker really does not show that you had an easier time earning your record, so it’s extremely difficult to make the case that a Top 8-worthy record like Aridome’s was materially inferior to another Top 8-worthy record like LSV’s simply because the tiebreakers were different.

So let’s say we look at not just the Top 8, but also those who only missed Top 8 on tiebreakers.

You know how LSV’s record was tied for eighth place with Aridome’s? Well, it just so happened that Sebastian Thaler was tied with the same record, coming in at 7th place. Coincidentally, Johan Sadeghpour had the same record as LSV, Thaler, and Aridome. Oh, and Philipp Summereder had that record, too. Not to mention Carlos Amaya Tronosco. Did I mention Thomas Kannegiesser? Yeah, he had the same record. So anyway, besides Chicago’s Rashad Miller, who ended up 14th, still with the same Swiss record as eventual champion LSV, no one else in the Top 16 had a shot at Top 8 on tiebreakers.

Think about that. Eight players had identical records going into the Top 8, but only LSV and Thaler had fortunate enough tiebreakers to compete for the title. What if the breakers – again, a factor entirely out of the players’ hands – had been different? LSV might not have made Top 8, let alone won the tournament. Dredge could have won it, or Goblins, or Mono-Red for all we know. Without LSV to pull out the Quarterfinals upset, Tezzerator could have won.

Literally, even if the Top 14 competitors themselves had played exactly the same way in the Swiss, and their opponents had played exactly the same way in their matches, we still could have ended up with a quarter of the Top 8 being completely different if their opponents had simply done better or worse later on in the tournament.

Even if you do think the right place to draw the line in your search for the Best List is Top 8, you should absolutely consider the decks that had the record, but not the tiebreakers, to make it to the single elimination rounds as well. To do otherwise would be, frankly, lazy and misleading.

To answer the original question (“where is the sweet spot?”), my personal method is to look at the Top 16, weight archetypes with more appearances more highly (that is, I think it is more likely that Elves! and Faeries are on to something than a deck with only one appearance, like Death Cloud). I also lend more credence to decks with more Swiss match wins (regardless of finish), meaning I’d value everyone who finished ninth through fourteenth at PT: Berlin exactly the same way. Finally, I adjust for playskill if a deck looks underpowered or otherwise deficient but has a fantastic pilot. From there, it’s all about the testing.

That said, I of course encourage you to develop your own method.

Tournament Size

The final piece of the puzzle to consider is tournament size. Olivier Ruel routinely shows up in the Top 8 of European GPs with around a thousand players in attendance. With so many consecutive match wins required to make it to the elimination rounds, that is nothing short of incredible.

On the other hand, I remember learning that someone had lost in the finals of a five-round PTQ. This meant that he had won five total matches of Magic that day. On that same day, someone across the country started off an eight-round PTQ with a similar 5-0 record, and then lost two rounds, missing out on Top 8 entirely.

Although both players won five matches and then lost out, many people browsing PTQ decklists would lend just as much credence to the 5-0 finalist decklist as they would the guy who had won eight matches to reach the finals of the larger PTQ. In reality, his finish was more comparable to that of the guy who started off 5-0 and then lost out, but we put so much emphasis on The Top 8 that the context of what it actually takes to reach Top 8 in a given tournament is often overlooked.

Think about Olivier Ruel the next time you look at a PTQ Top 8, and take a second to check how many players were in attendance.

All in all, what’s wrong with being results-oriented? Nothing, at least not inherently. It’s good to be somewhat results-oriented; what’s bad is being too results-oriented. The trick is finding the sweet spot.

So ask yourself: are you absolutely perfect at evaluating decklists, do you think you put too much stock in results, or are you wasting too much time playtesting decks that should have been filtered out earlier? Think about it.

See you next week!

Richard Feldman
Team :S
[email protected]

Bonus Section: Actually Random Eeny-Meeny-Miney-Moe Variant

If you want to actually decide something randomly, do this: select two people out of the group, and choose one to be the Starter and one to be the Counter. It doesn’t matter which is which. (Though if you’re having trouble deciding, may I suggest a Shaharazad-style subgame?)

The Counter closes his eyes, then the Starter points to someone as a starting point and closes his eyes as well. The Counter then holds up a random number of fingers, up to ten. On the count of three, both open their eyes. Then, starting with the person the Starter is pointing to, they count one person to the right (lapping around as necessary) for each finger the Counter is holding up. When all is said and done, the Starter is pointing to a random person from within the group!

This works easily for groups of up to 30 people, if the Counter gets clever with how he uses his hands. (If you have a group of more than 30 people, then guess what? One of you has a damn cell phone with a random number generator on it, smartass.)

Enjoy!