Feature Analysis – Extended PTQ Analysis

Read Feature Articles every Monday and Thursday... at StarCityGames.com!
If there’s one thing that aspiring Pro Tour Qualifier winners look for above all, it’s the latest in deck technology. While Paul cannot promise that today, he can bring us the next best thing: oodles and oodles of PTQ statistics. With unprecedented access to the data from the recent New York PTQ, Paul wrings every ounce of pertinent data from the information. Warning: This is not for the faint of heart…

Last year I did a statistical mini-analysis of a Neutral Ground mock tournament, trying to see which decks fared better against others in the hopes of finding an edge. That tournament had 14 players and 38 matches; a small sample size but larger than your average playtest session. This Extended season, a mock tournament never really happened, but I still wanted to do some analysis. I wanted to get as much data as possible, and really see which decks were winning over time, not just those that won the tournament. There’s a long standing theory that if a deck shows up in numbers and ends up winning the whole thing, the success is based on brute force rather than actually being the best deck at the tournament. While I’m sure quantity plays a role, reducing the impact of random shuffling, pairings, and other effects, I do not want to assume that the deck only won because it had more chances. What if it was actually the best deck, and 30% of the field knew it and brought it? The most recent PTQ at Neutral Ground served as a surprisingly good test case for this theory.

With my curiosity piqued, I contacted Donald Lim of Replenish fame to see if I could gain access to all of the results from the PTQ. Not just the Top 8. Not just final standings. Everything. Round by round pairings and results, along with decklists. The round by round results were easy, Don just exported them from DCI Reporter and sent them to me via electronic mail. The decklists, unfortunately, were only in hard-copy. Points of fact: I live in New Jersey, Don lives in New York, my article deadline was approaching too quickly to entrust the United States Postal Service with the decklists. It looked like I was going to be going into the city for something other than a draft, which was no small feat. I don’t mind sitting in traffic for 60-90 minutes to get in a weekday draft and only have a 30 minute ride home late at night when the traffic has subsided. Sitting in it both ways was something that didn’t seem optimal to me. Thankfully, upon hearing my sob story, my wife volunteered to retrieve the stack of papers from Don while she was in the city for work. Yeah, I’m a lucky guy alright.

So now I had all of the data I needed. 180 decklists, just under 500 match records. After some tedious data-entry I managed to load it all into Excel, where the real magic happens. Initially I wanted to do some sort of readout on card choices, deck variations, etc. Imagine for a moment manually typing up 180 decklists. Got it in your head? Good, so you know why I didn’t do it either. What I did instead was set up broad categories and then try to do some further classification within them. Rock is a prime example. I have a broad category of “Rock” and several child categories like “Doran,” “Confidant,” “Generic,” and “Gifts.” Disclaimer #1: the naming conventions I used are either very recognizable names, or largely long-standing archetypes and/or card names. Some categories really only have one deck type, like Ideal. Disclaimer #2: generalizations were made; breaking everything up into its own category because it is six cards different from Deck A does not serve a function for us, as we’re trying to reach an understanding of a general archetype. Once that is achieved, further analysis can be done.

If you’d like, an Excel document covering this data can be provided. I’ve removed people’s names and replaced them with an arbitrary player_id, to protect those who, like myself, scrubbed out. Disclaimer #3 : there were 2 decklists missing. I’m told they were various homebrews that were missing due to deck-registration failures. I’ve included them as “unknown homebrew.”

There is a lot of data here, I’ll start off with a simple breakdown of decks and categories played.

Deck_Cat Deck Total Decks Percent to Total
Rock Doran 34 18.89%
  Gifts-Rock 6 3.33%
  Confidant Rock 5 2.78%
  Generic Rock 5 2.78%
Rock Total   50 27.78%
Affinity Affinity-Shrapnel 13 7.22%
  Affinity-Goyf/Blast 2 1.11%
  Affinity-Frenzy 2 1.11%
  Affinity-Goyf 2 1.11%
  Affinity-Bob 1 0.56%
Affinity Total   20 11.11%
Zoo Vindicate Zoo 11 6.11%
  Gaea’s Might Get There 6 3.33%
  RGb Zoo 2 1.11%
Zoo Total   19 10.56%
Counter-top Chase Rare 12 6.67%
  Next Level Blue 6 3.33%
Counter-top Total   18 10.00%
homebrew Unknown 2 1.11%
  Mono Black Control 2 1.11%
  Ǽthermage’s Touch 1 0.56%
  BW aggro 1 0.56%
  Golgari.dec 1 0.56%
  BG Elves 1 0.56%
  Orb Fish 1 0.56%
  RG Ensnaring Bridge 1 0.56%
homebrew Total   10 5.56%
Goblins Goblins 9 5.00%
Goblins Total   9 5.00%
Tron UG Tron 5 2.78%
  UW Tron 3 1.67%
Tron Total   8 4.44%
Loam Loam 7 3.89%
  Slide 1 0.56%
Loam Total   8 4.44%
Mono Blue Control Spire Blue 5 2.78%
  Mono Blue Control 2 1.11%
Mono Blue Control Total   7 3.89%
RG Burn RG Burn 4 2.22%
RG Burn Total   4 2.22%
Dredge Dredge 4 2.22%
Dredge Total   4 2.22%
Tog Tog 3 1.67%
Tog Total   3 1.67%
Balancing Tings Balancing Tings 3 1.67%
Balancing Tings Total   3 1.67%
Cephalid Breakfast Cephalid Breakfast 2 1.11%
Cephalid Breakfast Total   2 1.11%
Mind’s Desire Mind’s Desire 2 1.11%
Mind’s Desire Total   2 1.11%
Tooth and Nail Tooth and Nail 2 1.11%
Tooth and Nail Total   2 1.11%
GW Midrange GW Midrange Gaddock 2 1.11%
GW Midrange Total   2 1.11%
Ideal Ideal 2 1.11%
Ideal Total   2 1.11%
Storm Grapeshot Storm 1 0.56%
  Dragonstorm 1 0.56%
Storm Total   2 1.11%
Charbelcher Charbelcher 1 0.56%
Charbelcher Total   1 0.56%
Threshold Threshold 1 0.56%
Threshold Total   1 0.56%
Fish Fish 1 0.56%
Fish Total   1 0.56%
Elves Oppo-Elves 1 0.56%
Elves Total   1 0.56%
White Control Flores White 1 0.56%
White Control Total   1 0.56%
Grand Total   180 100.00%

No real surprises so far. Rock decks, in all their forms, represented over a quarter of the field. Which means, to win this PTQ (8 rounds plus Top 8) you should expect to play against the Rock three times. If you can’t beat the Rock, you’re going home without an envelope. Affinity, Zoo, and Counter-Top decks were the next-most represented decks, with everything else representing less than 10% of the field. I should note that I lumped Chase-Rare deck and Next Level Blue together for this purpose.

So now we know what we’re dealing with, and Chris Pikula prediction of a heavy Rock field has been proven correct. The next layer is to examine how each deck performed overall. The below table shows how each category / deck performed throughout the day. Mirror matches have been removed. I’ve limited this to the top 10 categories in terms of matches played, the full readout is in the Excel document at the end.

mirror_deck N    
player_deck_cat player_deck Data Total
Rock Generic Rock Count 34
    win % 65.63%
  Confidant Rock Count 33
    win % 64.52%
  Doran Count 152
    win % 60.14%
  Gifts-Rock Count 31
    win % 50.00%
Rock Count     250
Rock Sum of win %     60.34%
Loam Slide Count 8
    win % 66.67%
  Loam Count 43
    win % 55.26%
Loam Count     51
Loam Sum of win %     56.82%
Zoo Gaea’s Might Get There Count 42
    win % 60.98%
  Vindicate Zoo Count 55
    win % 47.17%
  RGb Zoo Count 12
    win % 45.45%
Zoo Count     109
Zoo Sum of win %     52.38%
Mono Blue Control Mono Blue Control Count 13
    win % 66.67%
  Spire Blue Count 24
    win % 43.48%
Mono Blue Control Count   37
Mono Blue Control Sum of win %   51.43%
Affinity Affinity-Goyf Count 12
    win % 54.55%
  Affinity-Frenzy Count 13
    win % 53.85%
  Affinity-Shrapnel Count 61
    win % 45.76%
  Affinity-Goyf/Blast Count 11
    win % 45.45%
  Affinity-Bob Count 4
    win % 25.00%
Affinity Count     101
Affinity Sum of win %     46.94%
Counter-top Chase Rare Count 52
    win % 47.83%
  Next Level Blue Count 30
    win % 42.31%
Counter-top Count     82
Counter-top Sum of win %   45.83%
Goblins Goblins Count 45
    win % 43.18%
Goblins Count     45
Goblins Sum of win %     43.18%
RG Burn RG Burn Count 20
    win % 40.00%
RG Burn Count     20
RG Burn Sum of win %   40.00%
Tron UG Tron Count 29
    win % 48.15%
  UW Tron Count 16
    win % 21.43%
Tron Count     45
Tron Sum of win %     39.02%
homebrew BW aggro Count 4
    win % 50.00%
  Mono Black Control Count 11
    win % 40.00%
  BG Elves Count 6
    win % 40.00%
  Unknown Count 7
    win % 33.33%
  Ǽthermage’s Touch Count 8
    win % 25.00%
  Golgari.dec Count 5
    win % 20.00%
  Orb Fish Count 5
    win % 20.00%
  RG Ensnaring Bridge Count 5
    win % 20.00%
homebrew Count     51
homebrew Sum of win %   31.25%
Total Count     791
Total Sum of win %     50.74%

From this we again see the dominance of Rock. Not just in terms of popularity, but also with regards to getting the W. We can also see though, that within the Rock category, old-school generic Rock decks and Doran-less versions outperformed the eventual winner. Gifts rock was the clear family failure, but even that put up a .500 record. Loam, which represented less than 5% of the field, showed up as the 2nd best performing category. Tron was a clear disappointment, posting a sub .400 record. Unsurprisingly, homebrews got steamrolled throughout the day.

“But Paul,” you say, “if homebrews got steamrolled all day, and Rock decks were steamrolling people all day, isn’t it possible that Rock’s win percentage is inflated from bashing those folks who chose to disregard the latest net-tech and bring their own concoction?” Sure it is, let’s take a look and see who Rock was beating up on.

opp_deck_cat Data Total
Zoo Count 43
  win % 60.98%
Affinity Count 31
  win % 74.19%
Counter-top Count 19
  win % 56.25%
Loam Count 17
  win % 53.85%
Goblins Count 15
  win % 66.67%
homebrew Count 14
  win % 78.57%
Tron Count 11
  win % 70.00%
Dredge Count 9
  win % 62.50%
Mono Blue Control Count 8
  win % 42.86%
Balancing Tings Count 7
  win % 42.86%
RG Burn Count 7
  win % 42.86%
Ideal Count 6
  win % 40.00%
Tooth and Nail Count 6
  win % 83.33%
White Control Count 4
  win % 66.67%
Cephalid Breakfast Count 4
  win % 25.00%
Storm Count 4
  win % 50.00%
GW Midrange Count 3
  win % 66.67%
Tog Count 3
  win % 33.33%
Mind’s Desire Count 1
  win % 100.00%
Elves Count 1
  win % 100.00%
Threshold Count 1
  win % 100.00%
Total Count   214
Total Sum of win % 62.00%  

Again excluding mirror matches, Rock most frequently played against Zoo decks, and lit them up about 61% of the time or about as often as they beat the rest of the field. Homebrews did get destroyed when playing against the Rock, but they were only 6.5% of Rock’s non-mirror matches so we can’t really blame them for inflating the numbers since they represented 5.5% of the field anyway.

Most of these numbers look promising for Rock players and the ones that don’t are sadly a very small sample size. Balancing Tings, Mono Blue Control, RG Burn, and Ideal look like logical choices for beating the Rock, though they each have less than 10 matches so it is hardly conclusive. But let’s assume for a moment that those numbers would at least mostly hold up with a larger sample size and that those 4 are your best bet for beating Rock. How well do they handle the rest of the field (remember, Rock was just barely over a quarter of the field, so you still have to beat everyone else).

player_deck_cat player_deck Data Total
Ideal Ideal Count 6
    win % 66.67%
Ideal Count     6
Ideal Sum of win %     66.67%
Balancing Tings Balancing Tings Count 11
    win % 54.55%
Balancing Tings Count   11  
Balancing Tings Sum of win %   54.55%  
Mono Blue Control Mono Blue Control Count 9
    win % 66.67%
  Spire Blue Count 20
    win % 42.11%
Mono Blue Control Count   29  
Mono Blue Control Sum of win %   50.00%  
RG Burn RG Burn Count 13
    win % 30.77%
RG Burn Count     13
RG Burn Sum of win %   30.77%  

I wish we had more data for these, as it looks promising for the first three with a small sample size. RG Burn, however, looks like it can handle Rock but basically runs away from anything else ever. Unfortunately, 55% of the matches for Tings and Mono Blue played were against Affinity, meaning there’s virtually no data on the rest of the field. Ideal’s is split across five different decks, meaning the sample size is basically one for all of them, or largely irrelevant to us for now.

Without more data, we’re left thinking that Rock is the best choice for an upcoming PTQ. So if you’re going to be playing Rock, and expect a lot of Rock, what do you do? Let’s go a little further into this and see which Rock decks beat the other ones.

player_deck Data Confidant Rock Doran Generic Rock Gifts-Rock Grand Total
Doran Count 11   2 4 17
  win % 44.44% n/a 100.00% 50.00% 53.33%
Gifts-Rock Count   4     4
  win % n/a 50.00% n/a n/a 50.00%
Confidant Rock Count   11 1   12
  win % n/a 55.56% 0.00% n/a 50.00%
Generic Rock Count 1 2     3
  win % 100.00% 0.00% n/a n/a 33.33%

Now we can start to see that, while Doran did slightly worse against the field as a whole, it was slightly better in the Rock on Rock love. Again, we’re lacking enough data to fully analyze this, but we’re starting to get a clearer picture of how this particular PTQ went.

I would be remiss if I didn’t include some nuggets about some of the decks that had low representation. If we don’t limit ourselves to the Top 10 in matches played and just take the single deck with the best win percentage on the day, we have a White deck from Mike Flores, whose 5-2-1 record failed to secure him a Top 8 berth. 5-2-1 is good for 71%, though, which seems amazing when compared to the overall Rock score of 60%. This is where the danger in small sample sizes lies. While Mike did post the highest win percentage, he also had a long subway ride home to ponder his poor performance (as definied by not winning the blue envelope). His tops win percentage, due to a small sampling, did not translate into anything close to noteworthy.

So what can you do with these numbers? You cannot take these numbers and say you expect to win 60% of your matches. If you do, you’re saying that your plan for the day is 3-2 drop. Obviously, somebody wins the tournament. The problem is that you are stating what everyone armed with the deck is expected to do. To single yourself out, the range of variation increases greatly.

What we’re saying is that in this field, I would expect Rock decks to perform at a 60.34% win percentage. I can say this with a decent degree of certainty. The range of records for all Rock players is literally 0% – 100%. The eventual winner was 9-0-2, while five different Rock players failed to post a single victory. So when it comes to predicting an individual’s record with a specific deck, the model is wildly inaccurate. I could get into the entire statistics of it all with some advanced calculations, but it would prove to be busy-work as we already know that there is an incredibly wide range (widest possible). The distribution is nowhere near normal. 38%, or 19 of the 50 Rock players, had record worse than 50%, while 48% posted a record above .500 (the remaining 14% were, obviously, equal to .500). The average record of a Rock player was only 51% but the deck as a whole won 60% of its games; meaning its top performers out-dueled its bottom performers to raise the overall score.

I hope this exercise has been as interesting for you as it has been for me. While I suspected that Rock was the leader in terms of representation, I was skeptical that it was the performance leader as well. I want to leave with some caveats. Of course, this only represents one metagame, that of New York. Regional metagames can be highly different, so please take that into consideration when making your deck choice. That should only have an impact on the representation numbers, not necessarily on the performance indicators. Also, this represents Week 1 technology. It is incredibly possible and even likely that a deck that falls into the same category as one of these finds a new card that totally destroys Rock, moving it from a 2-3 ‘dog to a 3-2 favorite. And finally, naturally play skill will be the final decider. If your opponent is just awful, but his deck is very good against yours, don’t pack it up so quickly. No matchup is 100%, so you can very possibly tip the scales in your favor by being as awesome as you are.

If I’m able to get my hands on more data later in the season, I certainly will try to do a follow-up to this, as well as a comparison. If anyone out there can get me non-New York-centric data, I’d love to compare regional/national differences.


[Editor’s Note – If you’d like a copy of the Excel document, send an email to Mail us at https://sales.starcitygames.com/contactus/contactform.php?emailid=2 with “JORDAN EXCEL” in the subject line. – Craig.]