Feature Analysis - Extended PTQ Analysis

Last year I did a statistical mini-analysis of a Neutral Ground mock tournament, trying to see which decks fared better against others in the hopes of finding an edge. That tournament had 14 players and 38 matches; a small sample size but larger than your average playtest session. This Extended season, a mock tournament never really happened, but I still wanted to do some analysis. I wanted to get as much data as possible, and really see which decks were winning over time, not just those that won the tournament. There’s a long standing theory that if a deck shows up in numbers and ends up winning the whole thing, the success is based on brute force rather than actually being the best deck at the tournament. While I’m sure quantity plays a role, reducing the impact of random shuffling, pairings, and other effects, I do not want to assume that the deck only won because it had more chances. What if it was actually the best deck, and 30% of the field knew it and brought it? The most recent PTQ at Neutral Ground served as a surprisingly good test case for this theory.

With my curiosity piqued, I contacted Donald Lim of Replenish fame to see if I could gain access to all of the results from the PTQ. Not just the Top 8. Not just final standings. Everything. Round by round pairings and results, along with decklists. The round by round results were easy, Don just exported them from DCI Reporter and sent them to me via electronic mail. The decklists, unfortunately, were only in hard-copy. Points of fact: I live in New Jersey, Don lives in New York, my article deadline was approaching too quickly to entrust the United States Postal Service with the decklists. It looked like I was going to be going into the city for something other than a draft, which was no small feat. I don’t mind sitting in traffic for 60-90 minutes to get in a weekday draft and only have a 30 minute ride home late at night when the traffic has subsided. Sitting in it both ways was something that didn’t seem optimal to me. Thankfully, upon hearing my sob story, my wife volunteered to retrieve the stack of papers from Don while she was in the city for work. Yeah, I’m a lucky guy alright.

So now I had all of the data I needed. 180 decklists, just under 500 match records. After some tedious data-entry I managed to load it all into Excel, where the real magic happens. Initially I wanted to do some sort of readout on card choices, deck variations, etc. Imagine for a moment manually typing up 180 decklists. Got it in your head? Good, so you know why I didn’t do it either. What I did instead was set up broad categories and then try to do some further classification within them. Rock is a prime example. I have a broad category of “Rock” and several child categories like “Doran,” “Confidant,” “Generic,” and “Gifts.” Disclaimer #1: the naming conventions I used are either very recognizable names, or largely long-standing archetypes and/or card names. Some categories really only have one deck type, like Ideal. Disclaimer #2: generalizations were made; breaking everything up into its own category because it is six cards different from Deck A does not serve a function for us, as we’re trying to reach an understanding of a general archetype. Once that is achieved, further analysis can be done.

If you’d like, an Excel document covering this data can be provided. I’ve removed people’s names and replaced them with an arbitrary player_id, to protect those who, like myself, scrubbed out. Disclaimer #3 : there were 2 decklists missing. I’m told they were various homebrews that were missing due to deck-registration failures. I’ve included them as “unknown homebrew.”

There is a lot of data here, I’ll start off with a simple breakdown of decks and categories played.

Deck_Cat	Deck	Total Decks	Percent to Total
Rock	Doran	34	18.89%
	Gifts-Rock	6	3.33%
	Confidant Rock	5	2.78%
	Generic Rock	5	2.78%
Rock Total		50	27.78%
Affinity	Affinity-Shrapnel	13	7.22%
	Affinity-Goyf/Blast	2	1.11%
	Affinity-Frenzy	2	1.11%
	Affinity-Goyf	2	1.11%
	Affinity-Bob	1	0.56%
Affinity Total		20	11.11%
Zoo	Vindicate Zoo	11	6.11%
	Gaea’s Might Get There	6	3.33%
	RGb Zoo	2	1.11%
Zoo Total		19	10.56%
Counter-top	Chase Rare	12	6.67%
	Next Level Blue	6	3.33%
Counter-top Total		18	10.00%
homebrew	Unknown	2	1.11%
	Mono Black Control	2	1.11%
	Ǽthermage’s Touch	1	0.56%
	BW aggro	1	0.56%
	Golgari.dec	1	0.56%
	BG Elves	1	0.56%
	Orb Fish	1	0.56%
	RG Ensnaring Bridge	1	0.56%
homebrew Total		10	5.56%
Goblins	Goblins	9	5.00%
Goblins Total		9	5.00%
Tron	UG Tron	5	2.78%
	UW Tron	3	1.67%
Tron Total		8	4.44%
Loam	Loam	7	3.89%
	Slide	1	0.56%
Loam Total		8	4.44%
Mono Blue Control	Spire Blue	5	2.78%
	Mono Blue Control	2	1.11%
Mono Blue Control Total		7	3.89%
RG Burn	RG Burn	4	2.22%
RG Burn Total		4	2.22%
Dredge	Dredge	4	2.22%
Dredge Total		4	2.22%
Tog	Tog	3	1.67%
Tog Total		3	1.67%
Balancing Tings	Balancing Tings	3	1.67%
Balancing Tings Total		3	1.67%
Cephalid Breakfast	Cephalid Breakfast	2	1.11%
Cephalid Breakfast Total		2	1.11%
Mind’s Desire	Mind’s Desire	2	1.11%
Mind’s Desire Total		2	1.11%
Tooth and Nail	Tooth and Nail	2	1.11%
Tooth and Nail Total		2	1.11%
GW Midrange	GW Midrange Gaddock	2	1.11%
GW Midrange Total		2	1.11%
Ideal	Ideal	2	1.11%
Ideal Total		2	1.11%
Storm	Grapeshot Storm	1	0.56%
	Dragonstorm	1	0.56%
Storm Total		2	1.11%
Charbelcher	Charbelcher	1	0.56%
Charbelcher Total		1	0.56%
Threshold	Threshold	1	0.56%
Threshold Total		1	0.56%
Fish	Fish	1	0.56%
Fish Total		1	0.56%
Elves	Oppo-Elves	1	0.56%
Elves Total		1	0.56%
White Control	Flores White	1	0.56%
White Control Total		1	0.56%
Grand Total		180	100.00%

No real surprises so far. Rock decks, in all their forms, represented over a quarter of the field. Which means, to win this PTQ (8 rounds plus Top 8) you should expect to play against the Rock three times. If you can’t beat the Rock, you’re going home without an envelope. Affinity, Zoo, and Counter-Top decks were the next-most represented decks, with everything else representing less than 10% of the field. I should note that I lumped Chase-Rare deck and Next Level Blue together for this purpose.

So now we know what we’re dealing with, and Chris Pikula prediction of a heavy Rock field has been proven correct. The next layer is to examine how each deck performed overall. The below table shows how each category / deck performed throughout the day. Mirror matches have been removed. I’ve limited this to the top 10 categories in terms of matches played, the full readout is in the Excel document at the end.

mirror_deck	N

player_deck_cat	player_deck	Data	Total
Rock	Generic Rock	Count	34
		win %	65.63%
	Confidant Rock	Count	33
		win %	64.52%
	Doran	Count	152
		win %	60.14%
	Gifts-Rock	Count	31
		win %	50.00%
Rock Count			250
Rock Sum of win %			60.34%

Loam	Slide	Count	8
		win %	66.67%
	Loam	Count	43
		win %	55.26%
Loam Count			51
Loam Sum of win %			56.82%

Zoo	Gaea’s Might Get There	Count	42
		win %	60.98%
	Vindicate Zoo	Count	55
		win %	47.17%
	RGb Zoo	Count	12
		win %	45.45%
Zoo Count			109
Zoo Sum of win %			52.38%

Mono Blue Control	Mono Blue Control	Count	13
		win %	66.67%
	Spire Blue	Count	24
		win %	43.48%
Mono Blue Control Count		37
Mono Blue Control Sum of win %		51.43%

Affinity	Affinity-Goyf	Count	12
		win %	54.55%
	Affinity-Frenzy	Count	13
		win %	53.85%
	Affinity-Shrapnel	Count	61
		win %	45.76%
	Affinity-Goyf/Blast	Count	11
		win %	45.45%
	Affinity-Bob	Count	4
		win %	25.00%
Affinity Count			101
Affinity Sum of win %			46.94%

Counter-top	Chase Rare	Count	52
		win %	47.83%
	Next Level Blue	Count	30
		win %	42.31%
Counter-top Count			82
Counter-top Sum of win %		45.83%

Goblins	Goblins	Count	45
		win %	43.18%
Goblins Count			45
Goblins Sum of win %			43.18%

RG Burn	RG Burn	Count	20
		win %	40.00%
RG Burn Count			20
RG Burn Sum of win %		40.00%

Tron	UG Tron	Count	29
		win %	48.15%
	UW Tron	Count	16
		win %	21.43%
Tron Count			45
Tron Sum of win %			39.02%

homebrew	BW aggro	Count	4
		win %	50.00%
	Mono Black Control	Count	11
		win %	40.00%
	BG Elves	Count	6
		win %	40.00%
	Unknown	Count	7
		win %	33.33%
	Ǽthermage’s Touch	Count	8
		win %	25.00%
	Golgari.dec	Count	5
		win %	20.00%
	Orb Fish	Count	5
		win %	20.00%
	RG Ensnaring Bridge	Count	5
		win %	20.00%
homebrew Count			51
homebrew Sum of win %		31.25%

Total Count			791
Total Sum of win %			50.74%

From this we again see the dominance of Rock. Not just in terms of popularity, but also with regards to getting the W. We can also see though, that within the Rock category, old-school generic Rock decks and Doran-less versions outperformed the eventual winner. Gifts rock was the clear family failure, but even that put up a .500 record. Loam, which represented less than 5% of the field, showed up as the 2nd best performing category. Tron was a clear disappointment, posting a sub .400 record. Unsurprisingly, homebrews got steamrolled throughout the day.

“But Paul,” you say, “if homebrews got steamrolled all day, and Rock decks were steamrolling people all day, isn’t it possible that Rock’s win percentage is inflated from bashing those folks who chose to disregard the latest net-tech and bring their own concoction?” Sure it is, let’s take a look and see who Rock was beating up on.

opp_deck_cat	Data	Total
Zoo	Count	43
	win %	60.98%
Affinity	Count	31
	win %	74.19%
Counter-top	Count	19
	win %	56.25%
Loam	Count	17
	win %	53.85%
Goblins	Count	15
	win %	66.67%
homebrew	Count	14
	win %	78.57%
Tron	Count	11
	win %	70.00%
Dredge	Count	9
	win %	62.50%
Mono Blue Control	Count	8
	win %	42.86%
Balancing Tings	Count	7
	win %	42.86%
RG Burn	Count	7
	win %	42.86%
Ideal	Count	6
	win %	40.00%
Tooth and Nail	Count	6
	win %	83.33%
White Control	Count	4
	win %	66.67%
Cephalid Breakfast	Count	4
	win %	25.00%
Storm	Count	4
	win %	50.00%
GW Midrange	Count	3
	win %	66.67%
Tog	Count	3
	win %	33.33%
Mind’s Desire	Count	1
	win %	100.00%
Elves	Count	1
	win %	100.00%
Threshold	Count	1
	win %	100.00%
Total Count		214
Total Sum of win %	62.00%

Again excluding mirror matches, Rock most frequently played against Zoo decks, and lit them up about 61% of the time or about as often as they beat the rest of the field. Homebrews did get destroyed when playing against the Rock, but they were only 6.5% of Rock’s non-mirror matches so we can’t really blame them for inflating the numbers since they represented 5.5% of the field anyway.

Most of these numbers look promising for Rock players and the ones that don’t are sadly a very small sample size. Balancing Tings, Mono Blue Control, RG Burn, and Ideal look like logical choices for beating the Rock, though they each have less than 10 matches so it is hardly conclusive. But let’s assume for a moment that those numbers would at least mostly hold up with a larger sample size and that those 4 are your best bet for beating Rock. How well do they handle the rest of the field (remember, Rock was just barely over a quarter of the field, so you still have to beat everyone else).

player_deck_cat	player_deck	Data	Total
Ideal	Ideal	Count	6
		win %	66.67%
Ideal Count			6
Ideal Sum of win %			66.67%

Balancing Tings	Balancing Tings	Count	11
		win %	54.55%
Balancing Tings Count		11
Balancing Tings Sum of win %		54.55%

Mono Blue Control	Mono Blue Control	Count	9
		win %	66.67%
	Spire Blue	Count	20
		win %	42.11%
Mono Blue Control Count		29
Mono Blue Control Sum of win %		50.00%

RG Burn	RG Burn	Count	13
		win %	30.77%
RG Burn Count			13
RG Burn Sum of win %		30.77%

I wish we had more data for these, as it looks promising for the first three with a small sample size. RG Burn, however, looks like it can handle Rock but basically runs away from anything else ever. Unfortunately, 55% of the matches for Tings and Mono Blue played were against Affinity, meaning there’s virtually no data on the rest of the field. Ideal’s is split across five different decks, meaning the sample size is basically one for all of them, or largely irrelevant to us for now.

Without more data, we’re left thinking that Rock is the best choice for an upcoming PTQ. So if you’re going to be playing Rock, and expect a lot of Rock, what do you do? Let’s go a little further into this and see which Rock decks beat the other ones.

player_deck	Data	Confidant Rock	Doran	Generic Rock	Gifts-Rock	Grand Total
Doran	Count	11		2	4	17
	win %	44.44%	n/a	100.00%	50.00%	53.33%
Gifts-Rock	Count		4			4
	win %	n/a	50.00%	n/a	n/a	50.00%
Confidant Rock	Count		11	1		12
	win %	n/a	55.56%	0.00%	n/a	50.00%
Generic Rock	Count	1	2			3
	win %	100.00%	0.00%	n/a	n/a	33.33%

Now we can start to see that, while Doran did slightly worse against the field as a whole, it was slightly better in the Rock on Rock love. Again, we’re lacking enough data to fully analyze this, but we’re starting to get a clearer picture of how this particular PTQ went.

I would be remiss if I didn’t include some nuggets about some of the decks that had low representation. If we don’t limit ourselves to the Top 10 in matches played and just take the single deck with the best win percentage on the day, we have a White deck from Mike Flores, whose 5-2-1 record failed to secure him a Top 8 berth. 5-2-1 is good for 71%, though, which seems amazing when compared to the overall Rock score of 60%. This is where the danger in small sample sizes lies. While Mike did post the highest win percentage, he also had a long subway ride home to ponder his poor performance (as definied by not winning the blue envelope). His tops win percentage, due to a small sampling, did not translate into anything close to noteworthy.

So what can you do with these numbers? You cannot take these numbers and say you expect to win 60% of your matches. If you do, you’re saying that your plan for the day is 3-2 drop. Obviously, somebody wins the tournament. The problem is that you are stating what everyone armed with the deck is expected to do. To single yourself out, the range of variation increases greatly.

What we’re saying is that in this field, I would expect Rock decks to perform at a 60.34% win percentage. I can say this with a decent degree of certainty. The range of records for all Rock players is literally 0% – 100%. The eventual winner was 9-0-2, while five different Rock players failed to post a single victory. So when it comes to predicting an individual’s record with a specific deck, the model is wildly inaccurate. I could get into the entire statistics of it all with some advanced calculations, but it would prove to be busy-work as we already know that there is an incredibly wide range (widest possible). The distribution is nowhere near normal. 38%, or 19 of the 50 Rock players, had record worse than 50%, while 48% posted a record above .500 (the remaining 14% were, obviously, equal to .500). The average record of a Rock player was only 51% but the deck as a whole won 60% of its games; meaning its top performers out-dueled its bottom performers to raise the overall score.

I hope this exercise has been as interesting for you as it has been for me. While I suspected that Rock was the leader in terms of representation, I was skeptical that it was the performance leader as well. I want to leave with some caveats. Of course, this only represents one metagame, that of New York. Regional metagames can be highly different, so please take that into consideration when making your deck choice. That should only have an impact on the representation numbers, not necessarily on the performance indicators. Also, this represents Week 1 technology. It is incredibly possible and even likely that a deck that falls into the same category as one of these finds a new card that totally destroys Rock, moving it from a 2-3 â€˜dog to a 3-2 favorite. And finally, naturally play skill will be the final decider. If your opponent is just awful, but his deck is very good against yours, don’t pack it up so quickly. No matchup is 100%, so you can very possibly tip the scales in your favor by being as awesome as you are.

If I’m able to get my hands on more data later in the season, I certainly will try to do a follow-up to this, as well as a comparison. If anyone out there can get me non-New York-centric data, I’d love to compare regional/national differences.

[Editor’s Note – If you’d like a copy of the Excel document, send an email to Mail us at https://sales.starcitygames.com/contactus/contactform.php?emailid=2 with “JORDAN EXCEL” in the subject line. – Craig.]