Chatter of the Squirrel - The Reveillark Fallacy, and the Application of Theory in Standard Pro Tours

In a shocking turn of events, instead of being self-reflectively emo I decided to write an article that might help you actually get better at Magic, or barring that, might at least help you win more games. Before I start, though, I do want to take a moment to thank all of the people that sent me feedback about my last article. I received more letters re: it than anything else I’ve written, and I’m glad to know that Magic has had a similar impact on scores of players besides myself. Beyond that, though, there’s something I want to make clear: by no means do I intend to stop being competitive, and I hope to keep writing until circumstances demand otherwise. There’s fight in this dog yet.

Especially when something really, really strikes at the wrong nerve.

I’m not known for my Regionals technology, since I’m notorious for hating Standard. Nevertheless, I can guarantee you that your chances for qualifying for Nats will demonstrably improve as a result of reading this article. Why? Because for the last week, the internet has been awash with a deluge of disinformation, leading to a series of terribly-misinformed but oft-repeated quote enlightened unquote conclusions about the “true” state of Standard in the Pro Tour’s wide-cast wake.

“Reveillark was the best deck to play at the PT,” people say. “And it’s a valid choice for Regionals. It doesn’t matter that it loses to Faeries – after all, two of them were in the Top 8!”

That’s a paraphrase, but you see where I’m going. You’ve heard it before. All of the sudden, people are re-considering ‘Lark as a viable Regionals choice, as an anchor of the metagame, when in truth that should not at all be the case. Not to say it won’t be – but now’s the opportunity to exploit a poor decision!

In fact, for the penny-pinchers out there who insist upon getting their 3Â¢/article in the form of concretely-applicable easily-digestible data bits, here’s the Chatter of the Squirrel Predicted Metagame Breakdown for your Average Regionals Field:

15% Faeries
15% GB Aggro
12% ‘Lark
10% Merfolk
10% Quick’n’Toast
5% RG Big Mana
5% Doran
5% RG Aggro (in the vein of Bram Snapplebottoms and/or New York.dec)
5% Mono-Red Aggro
5% Generic Combo
3% RB Tokens/Goblins
10% Other

Why is ‘Lark so tempting? Well, in addition to being talked about ad infinitum, it’s also the exact type of deck that people want to play. It’s full of fun, big spells. You draw a lot of cards. You have a cool, game-winning combo. It feels like control, so the “good” players who’ve “moved beyond” kiddie aggro decks will flock to it. It’s not Faeries. It’s “powerful in the abstract.”

And yet – it’s pretty bad, from the individual unit of analysis, if you want to win a tournament.

Let’s start from the perspective of the Pro Tour before we move on to Regionals. I’m going to lift a statistic from a Chapin article, but I understand that he was simply citing the math, so don’t construe this as an attack on his reasoning or his thesis or anything. Besides, his piece is hardly the only place inside of which you’ve heard it (or something similar):

Brief facts on Faeries versus Lark:

107 Faeries decks Day 1 (27% of field)
31 Faeries decks Day 2 (21%) (29% advanced compared to 35% for all)
1 Faerie deck Day 3 (3.5%) (compared to 6% for all)

18 Reveillark decks Day 1 (4.9%)
8 Reveillark decks Day 2 (6%) (44% advanced compared to 35% for all)

2 Reveillark decks Day 3 (25%) (25% advanced compared to 6% for all)

Thanks to BPM for pointing out how at every step of the way Faeries did worse than the field, continually dropping in numbers and underperforming. Reveillark, on the other hand, continually increased in numbers and always performed better than predicted.

Let’s first dissect some of the numbers themselves. It’s clear that we’re trying to gauge the decks’ overall performance, so the value of separating Day 1 / Day 2 / Day 3 out from one another is not inherently the status of playing on that day, but rather an indicator of the performance that allowed it to get there. In other words, we care about how often the decks win, or what their records are. With that in mind, we have to look at the decks whose records would have allowed them to play on Day 3, absent the inherently-random (in the sense of “uncontrollable” by the player) phenomenon of tiebreakers. The top twelve finishers all wound up with a record that could conceivably have allowed them to play on Sunday, and of those twelve, three of them were running Faeries. Thus, 25% of the “virtual” Sunday competitors were running the deck that comprised 21% of the previous day’s field – an improvement, actually, not a disappointment. You can say “well it was only PV who could make Day 2 with a 12-4 record,” and choose to exclude him from a hypothetical “who could get 37 total points” analysis, and conclude that 0 Faerie decks made the cut. But four members of the Top 8 were able to ID in the last round, and it doesn’t say much about a deck that it’s able to achieve a certain number of wins by a certain point – only that it wins X amount of time over a given tournament. So to get our most accurate number, the members of the Top 8 who IDd in the last round would have to play it out, and we’d all of the sudden have seven players at 36 points.

Furthermore, there are some problems with the conclusions that people tend to draw from the numbers. I will admit that the drop from 27% of the field to 21% is statistically significant, but that’s not indicative of a failing intrinsic to the deck. As Luis Scott-Vargas pointed out, “there are two types of people who are playing Faeries at this tournament – people who haven’t tested at all and played it because they thought it was the best deck, or people who tested a lot and played it because they knew it was the best deck.” There hasn’t been enough data collected to prove this thesis across multiple tournaments, but in my experience, overwhelmingly the most commonly-represented deck also has the largest total of mediocre pilots at the helm. That’s because the most heavily-played deck tends to be the “best” deck available in the mainstream, and so people run it because they know it’s not an embarrassingly bad choice and should give them at least a chance to win against almost anything they face up against. When you haven’t tested much, or don’t know all that much about a format, it’s a “safe” choice (although not necessarily “good”) choice to try and win the tournament.

So the 27-21% dropoff is likely to have been due at least in part to the players behind the decks, not just the Faeries archetype itself.

Contrast this to ‘Lark. The common knowledge going into the tournament was that ‘Lark was unsafe because of the preponderance of Faeries in the field on Day 1. This is not exactly high-level systemic thought, and I’d venture to say that over 90% of the players in Hollywood were at least familiar with this thesis. Therefore, anyone considering ‘Lark had undergone at least second-level thinking when it came to their deck-selection, which indicates to me a slightly higher level of preparedness for the tournament. It’s not the type of deck you’d just pick up and play without at least a modicum of reasoning behind the choice. My team, for example, had an excellent ‘Lark list together, so it’s not as if the idea to play ‘Lark just wasn’t occurring to people in the first place. To me, the decision to run that deck represented a deliberate choice.

The thing to understand about ‘Lark above all, though, is that that type of deck almost necessarily is going to take a couple of slots in the Top 8 because it’s so matchup-intensive. If I were betting on the composition of the Top 8 slots by archetype before the tournament, you can guarantee I’d put money on the fact that, given the reality that fifteen or so people would run the deck at the tournament, one of them would probably make the cut. The problem is that, when you’re choosing a deck to play yourself, you’re looking at the problem from a different unit of analysis. You’re not trying to get a deck into Sunday, you’re trying to get yourself, and so you have to look at an entirely different set of variables.

Let me explain. Across any given data set – take, for example, the eighteen people playing ‘Lark on Day 1 – you’re going to run across a more or less normal distribution of matchups that those players would face over the course of a given day. That is, supposing Faeries composes 30% of the field, the higher and higher your N (= number of people playing ‘Lark) becomes, the closer and closer it will average out to 30% of those matches being played against Faerie decks*. But with any normal distribution, you’re bound to have outliers. The more players play ‘Lark, the more likely it is that one of those players will fall at the far-right end of the curve, just not get paired against that many Faerie decks, and coast to the Top 8 by the virtue of his matchup pairings. This is in addition to the Magic reality that, simply due to statistics, a certain number of players are bound to get manascrewed fewer times on a given day, have their opponents mulligan more often, and be put in more opportunities to make game-winning masterful plays than the statistical average would suggest (just as a comperable amount of players will be manascrewed more often than normal, etc. etc.).

Look at Owling Mine in Honolulu, for example. Just given the postulate (and I understand this is problematic, just as it’s problematic to postulate that across-the-board, ‘Lark always loses to Faeriesâ€”but it’s not an unreasonable assumption when evaluating the state of matchups in general) that Owling Mine will lose to aggro decks and beat everything else, if there are enough Owling Mine decks, it’s not unlikely that one or two of them will play on Sunday even if the field is as much as 50% aggro. Some of the pilots will get outlier pairings, and well, mise, boys.

Think about it from a position of a player, though. Do you want to rely upon your matchups to get you there, and sort of cross your fingers? Or do you want to play a deck that gives you, the player, the maximum amount of control over your own fate in every given situation? Both of those answers can actually be correct, but it’s a question you must ask. Honestly, if I had done zero playtesting, running the Owling Mine or ‘Lark equivalent** can be the demonstrably correct call, just as it could be if a certain plan to shore up your deck’s weaknesses had proven effective in playtesting.

For Regionals, you may hear that ‘Lark performed better than expected, that it’s a real contender in the current metagame. But realize that in terms of your overall record, it’s actually easier numerically to make the Top 8 of a Pro Tour than it is to make the Top 4 of most Regionals***. If you’re looking for that Top 4, you can afford to lose at most one round in most cases. A cursory analysis of 2006-2007 Regionals reports here on StarCityGames.com reveals that you typically need a 6-1-1 record to make the Top 8, and then you’ve got to win that last match (it’s not as simple as just needing 7-1-1 because you can go 8-0 at first and then lose that last round to get knocked out cold). That means you can lose once. By contrast, to Top 8 a Pro Tour, you’ve got twice the number of rounds but can afford to lose two or three more times. (2x total T rounds, 3-4x total L number of allowable lossesâ€”I put it this way to make the discrepancy in ratio a little bit clearer). For a deck like ‘Lark, even assuming that you’ll lose to all the Faeries decks you play against, that still means you can get paired against them four times and have a shot at Sunday. At Regionals, you can only get paired against them once.

From an individual unit of analysis, Reveillark was neither the best deck to play in Hollywood, nor will it be the best deck to play at Regionals.

As for what was the best deck? I can’t say for sure, but I will point out that in the modern era of Magic design – Invasion forward – there have been three Standard Pro Tours. All were won by aggressive decks with a truly remarkable about of resilience or range – “range” being defined as the percentage chance to pull out a win against another deck’s maximum attempt to stabilize. Two of the decks had burn – Gindy in the form of Profane Command, Herberholz’ in the form of an impressive suite of Red direct damage spells – that would allow the deck to topdeck its way out of even the most daunting board position. Gindy’s man-lands provided additional insurance against removal, as threats that could be deployed to the board like delayed-capital investments, with the need to spend mana on them only coming when to do so would be the most opportune. Budde, meanwhile, could sling a Parallax Wave out of nowhere to clear even an opposing deck’s most threatening of creatures for a few turns, and could use Armageddon to shut out any chance of a comeback.

I don’t think it’s accidental that this type of deck continues to win. By presenting threats, you punish the opponent’s inability to establish his strategy in time, whatever it may be. More importantly, though, I believe it’s much easier to exploit the flaws in an opponent’s testing process, at least in Standard, if you’re helming an aggressive deck. Certain innovations like a higher man-land count, a greatest density of burn spells, or even pieces of technology like Slaughter Pact, Orcish Librarian, Wild Ricochet, or Furystoke Giant can tip the scales in your favor by a huge percentage. This is because by definition most wins against aggro decks come via a relatively narrow margin; because (from the perspective of a deck that assumes the control role against a Gindy or a Herberholz) all you have to do is stay alive for X amount of time before your inevitability takes over, you design your deck to achieve that balance to your satisfaction against the best aggressive decks your testing group can come up with. You don’t side in favor of overkill because you’re typically sacrificing percentages against every one of your other matchups, and it’s impossible to anticipate other groups’ innovations (usually) in a way that yields high return (how, for example, do I beat Jaya Ballard, Simian Spirit Guide-fueled Magi of the Moon, maindeck Loxodon Warhammers, maindeck Magi of the Scroll, and aggro-deck Bitterblossoms even if I can see all of them coming with only one or two particular number-nudges, without having any idea of how many people know about those pieces of technology and, of those people, who will choose to run them?) Therefore, tweaking the numbers only slightly in a deck like GB Elves to make it flawless will produce extremely high percentages of yield relative to the seeming severity of the changes.

So what would I play at Regionals?

If the Fae don’t make you want to go stand in traffic while drinking bleach, then I think you’ve got favorable matchups against basically everything, particularly within a field that isn’t gunning for them. If, on the other hand, you simply can’t bear to play another game with those little pests, you can’t go wrong with a well-tested GB Elves…

… unless, as a little leprechaun told me, you still believe that combo is the real deal, and there aren’t nearly enough Faerie-mages around anymore to play predator to your nimble little gazelle.

Zac

* This doesn’t take into account certain archetypes “rising and falling” to certain tiers in the standings, but such an effect was minimal at PT: Hollywood
** Again, not saying ‘Lark and Owling mine are necessarily all that similar to one another. Lark’s Faeries matchup isn’t nearly as bad as Owling Mine’s Red Aggro matchup, but Lark also doesn’t completely trounce its good matchups like Owling Mine could do, either. Trust me, we tried.
*** Pay attention to that “numerically” … I’m not saying that making the Top 8 of a PT and qualifying for Nats in Wherever, USA are even remotely comparable in terms of the skill required; I’m strictly talking about records here.