Last year I did a statistical mini-analysis of a Neutral Ground mock tournament, trying to see which decks fared better against others in the hopes of finding an edge. That tournament had 14 players and 38 matches; a small sample size but larger than your average playtest session. This Extended season, a mock tournament never really happened, but I still wanted to do some analysis. I wanted to get as much data as possible, and really see which decks were winning over time, not just those that won the tournament. There’s a long standing theory that if a deck shows up in numbers and ends up winning the whole thing, the success is based on brute force rather than actually being the best deck at the tournament. While I’m sure quantity plays a role, reducing the impact of random shuffling, pairings, and other effects, I do not want to assume that the deck only won because it had more chances. What if it was actually the best deck, and 30% of the field knew it and brought it? The most recent PTQ at Neutral Ground served as a surprisingly good test case for this theory.
With my curiosity piqued, I contacted Donald Lim of Replenish fame to see if I could gain access to all of the results from the PTQ. Not just the Top 8. Not just final standings. Everything. Round by round pairings and results, along with decklists. The round by round results were easy, Don just exported them from DCI Reporter and sent them to me via electronic mail. The decklists, unfortunately, were only in hard-copy. Points of fact: I live in New Jersey, Don lives in New York, my article deadline was approaching too quickly to entrust the United States Postal Service with the decklists. It looked like I was going to be going into the city for something other than a draft, which was no small feat. I don’t mind sitting in traffic for 60-90 minutes to get in a weekday draft and only have a 30 minute ride home late at night when the traffic has subsided. Sitting in it both ways was something that didn’t seem optimal to me. Thankfully, upon hearing my sob story, my wife volunteered to retrieve the stack of papers from Don while she was in the city for work. Yeah, I’m a lucky guy alright.
So now I had all of the data I needed. 180 decklists, just under 500 match records. After some tedious data-entry I managed to load it all into Excel, where the real magic happens. Initially I wanted to do some sort of readout on card choices, deck variations, etc. Imagine for a moment manually typing up 180 decklists. Got it in your head? Good, so you know why I didn’t do it either. What I did instead was set up broad categories and then try to do some further classification within them. Rock is a prime example. I have a broad category of “Rock” and several child categories like “Doran,” “Confidant,” “Generic,” and “Gifts.” Disclaimer #1: the naming conventions I used are either very recognizable names, or largely long-standing archetypes and/or card names. Some categories really only have one deck type, like Ideal. Disclaimer #2: generalizations were made; breaking everything up into its own category because it is six cards different from Deck A does not serve a function for us, as we’re trying to reach an understanding of a general archetype. Once that is achieved, further analysis can be done.
If you’d like, an Excel document covering this data can be provided. I’ve removed people’s names and replaced them with an arbitrary player_id, to protect those who, like myself, scrubbed out. Disclaimer #3 : there were 2 decklists missing. I’m told they were various homebrews that were missing due to deck-registration failures. I’ve included them as “unknown homebrew.”
There is a lot of data here, I’ll start off with a simple breakdown of decks and categories played.
|Deck_Cat||Deck||Total Decks||Percent to Total|
|Gaea’s Might Get There||6||3.33%|
|Next Level Blue||6||3.33%|
|Mono Black Control||2||1.11%|
|RG Ensnaring Bridge||1||0.56%|
|Mono Blue Control||Spire Blue||5||2.78%|
|Mono Blue Control||2||1.11%|
|Mono Blue Control Total||7||3.89%|
|RG Burn||RG Burn||4||2.22%|
|RG Burn Total||4||2.22%|
|Balancing Tings||Balancing Tings||3||1.67%|
|Balancing Tings Total||3||1.67%|
|Cephalid Breakfast||Cephalid Breakfast||2||1.11%|
|Cephalid Breakfast Total||2||1.11%|
|Mind’s Desire||Mind’s Desire||2||1.11%|
|Mind’s Desire Total||2||1.11%|
|Tooth and Nail||Tooth and Nail||2||1.11%|
|Tooth and Nail Total||2||1.11%|
|GW Midrange||GW Midrange Gaddock||2||1.11%|
|GW Midrange Total||2||1.11%|
|White Control||Flores White||1||0.56%|
|White Control Total||1||0.56%|
No real surprises so far. Rock decks, in all their forms, represented over a quarter of the field. Which means, to win this PTQ (8 rounds plus Top 8) you should expect to play against the Rock three times. If you can’t beat the Rock, you’re going home without an envelope. Affinity, Zoo, and Counter-Top decks were the next-most represented decks, with everything else representing less than 10% of the field. I should note that I lumped Chase-Rare deck and Next Level Blue together for this purpose.
So now we know what we’re dealing with, and Chris Pikula prediction of a heavy Rock field has been proven correct. The next layer is to examine how each deck performed overall. The below table shows how each category / deck performed throughout the day. Mirror matches have been removed. I’ve limited this to the top 10 categories in terms of matches played, the full readout is in the Excel document at the end.
|Rock Sum of win %||60.34%|
|Loam Sum of win %||56.82%|
|Zoo||Gaea’s Might Get There||Count||42|
|Zoo Sum of win %||52.38%|
|Mono Blue Control||Mono Blue Control||Count||13|
|Mono Blue Control Count||37|
|Mono Blue Control Sum of win %||51.43%|
|Affinity Sum of win %||46.94%|
|Next Level Blue||Count||30|
|Counter-top Sum of win %||45.83%|
|Goblins Sum of win %||43.18%|
|RG Burn||RG Burn||Count||20|
|RG Burn Count||20|
|RG Burn Sum of win %||40.00%|
|Tron Sum of win %||39.02%|
|Mono Black Control||Count||11|
|RG Ensnaring Bridge||Count||5|
|homebrew Sum of win %||31.25%|
|Total Sum of win %||50.74%|
From this we again see the dominance of Rock. Not just in terms of popularity, but also with regards to getting the W. We can also see though, that within the Rock category, old-school generic Rock decks and Doran-less versions outperformed the eventual winner. Gifts rock was the clear family failure, but even that put up a .500 record. Loam, which represented less than 5% of the field, showed up as the 2nd best performing category. Tron was a clear disappointment, posting a sub .400 record. Unsurprisingly, homebrews got steamrolled throughout the day.
“But Paul,” you say, “if homebrews got steamrolled all day, and Rock decks were steamrolling people all day, isn’t it possible that Rock’s win percentage is inflated from bashing those folks who chose to disregard the latest net-tech and bring their own concoction?” Sure it is, let’s take a look and see who Rock was beating up on.
|Mono Blue Control||Count||8|
|Tooth and Nail||Count||6|
|Total Sum of win %||62.00%|
Again excluding mirror matches, Rock most frequently played against Zoo decks, and lit them up about 61% of the time or about as often as they beat the rest of the field. Homebrews did get destroyed when playing against the Rock, but they were only 6.5% of Rock’s non-mirror matches so we can’t really blame them for inflating the numbers since they represented 5.5% of the field anyway.
Most of these numbers look promising for Rock players and the ones that don’t are sadly a very small sample size. Balancing Tings, Mono Blue Control, RG Burn, and Ideal look like logical choices for beating the Rock, though they each have less than 10 matches so it is hardly conclusive. But let’s assume for a moment that those numbers would at least mostly hold up with a larger sample size and that those 4 are your best bet for beating Rock. How well do they handle the rest of the field (remember, Rock was just barely over a quarter of the field, so you still have to beat everyone else).
|Ideal Sum of win %||66.67%|
|Balancing Tings||Balancing Tings||Count||11|
|Balancing Tings Count||11|
|Balancing Tings Sum of win %||54.55%|
|Mono Blue Control||Mono Blue Control||Count||9|
|Mono Blue Control Count||29|
|Mono Blue Control Sum of win %||50.00%|
|RG Burn||RG Burn||Count||13|
|RG Burn Count||13|
|RG Burn Sum of win %||30.77%|
I wish we had more data for these, as it looks promising for the first three with a small sample size. RG Burn, however, looks like it can handle Rock but basically runs away from anything else ever. Unfortunately, 55% of the matches for Tings and Mono Blue played were against Affinity, meaning there’s virtually no data on the rest of the field. Ideal’s is split across five different decks, meaning the sample size is basically one for all of them, or largely irrelevant to us for now.
Without more data, we’re left thinking that Rock is the best choice for an upcoming PTQ. So if you’re going to be playing Rock, and expect a lot of Rock, what do you do? Let’s go a little further into this and see which Rock decks beat the other ones.
|player_deck||Data||Confidant Rock||Doran||Generic Rock||Gifts-Rock||Grand Total|
Now we can start to see that, while Doran did slightly worse against the field as a whole, it was slightly better in the Rock on Rock love. Again, we’re lacking enough data to fully analyze this, but we’re starting to get a clearer picture of how this particular PTQ went.
I would be remiss if I didn’t include some nuggets about some of the decks that had low representation. If we don’t limit ourselves to the Top 10 in matches played and just take the single deck with the best win percentage on the day, we have a White deck from Mike Flores, whose 5-2-1 record failed to secure him a Top 8 berth. 5-2-1 is good for 71%, though, which seems amazing when compared to the overall Rock score of 60%. This is where the danger in small sample sizes lies. While Mike did post the highest win percentage, he also had a long subway ride home to ponder his poor performance (as definied by not winning the blue envelope). His tops win percentage, due to a small sampling, did not translate into anything close to noteworthy.
So what can you do with these numbers? You cannot take these numbers and say you expect to win 60% of your matches. If you do, you’re saying that your plan for the day is 3-2 drop. Obviously, somebody wins the tournament. The problem is that you are stating what everyone armed with the deck is expected to do. To single yourself out, the range of variation increases greatly.
What we’re saying is that in this field, I would expect Rock decks to perform at a 60.34% win percentage. I can say this with a decent degree of certainty. The range of records for all Rock players is literally 0% – 100%. The eventual winner was 9-0-2, while five different Rock players failed to post a single victory. So when it comes to predicting an individual’s record with a specific deck, the model is wildly inaccurate. I could get into the entire statistics of it all with some advanced calculations, but it would prove to be busy-work as we already know that there is an incredibly wide range (widest possible). The distribution is nowhere near normal. 38%, or 19 of the 50 Rock players, had record worse than 50%, while 48% posted a record above .500 (the remaining 14% were, obviously, equal to .500). The average record of a Rock player was only 51% but the deck as a whole won 60% of its games; meaning its top performers out-dueled its bottom performers to raise the overall score.
I hope this exercise has been as interesting for you as it has been for me. While I suspected that Rock was the leader in terms of representation, I was skeptical that it was the performance leader as well. I want to leave with some caveats. Of course, this only represents one metagame, that of New York. Regional metagames can be highly different, so please take that into consideration when making your deck choice. That should only have an impact on the representation numbers, not necessarily on the performance indicators. Also, this represents Week 1 technology. It is incredibly possible and even likely that a deck that falls into the same category as one of these finds a new card that totally destroys Rock, moving it from a 2-3 â€˜dog to a 3-2 favorite. And finally, naturally play skill will be the final decider. If your opponent is just awful, but his deck is very good against yours, don’t pack it up so quickly. No matchup is 100%, so you can very possibly tip the scales in your favor by being as awesome as you are.
If I’m able to get my hands on more data later in the season, I certainly will try to do a follow-up to this, as well as a comparison. If anyone out there can get me non-New York-centric data, I’d love to compare regional/national differences.
[Editor’s Note – If you’d like a copy of the Excel document, send an email to Mail us at https://sales.starcitygames.com/contactus/contactform.php?emailid=2 with “JORDAN EXCEL” in the subject line. – Craig.]