Mixed kNuts: Lies, Damn Lies, And Statistics

For those looking for a Statistics 101 class, I suggest you check your local university, as obtaining the knowledge in those classes can prove very valuable in one’s every day life. However, what I’m specifically addressing today is how playtesting numbers presented in Magic articles may or may not mean anything, depending on how they were generated. This is one of those subtle”help you be a better player articles,” so don’t be completely surprised if you learns you somethin’ before you is done.

This article could also be titled”When 550 still equals 550, but means nothing,” as I wouldn’t have thought to write it until Jose Argao submitted his”The Case For Mobilization: 550 Games With U/W Opposition.” Jose did a great deal of playtesting to prove that his U/W Mobilization deck was better than his detractors were saying it was, and the results seem to indicate that he was correct. However, the methodology that Jose used to conduct his testing made those 550 games little more than a test of his playing skills and not a test for the strength of the deck.

Aside: It looks like I’m picking on Jose again, but the truth is that I could have written this article about any number of publications in the last year. Jose is a smart guy and a solid writer, who will probably be producing excellent articles in the future… But for now, he provided a useful example and served as a catalyst in the development of this article.

Methodology And Why It Matters

Methodology is defined as”the analysis of the principles or procedures of inquiry.” For purposes of scientific research, Methodology is God. If the results of your research cannot be reproduced by the scientific community at large, then they might as well not have been published. If you produce an article for publication in a prestigious journal and your methodology turns out to have a lot of holes in it, then you are going to take a bunch of lumps from the academic community as a whole and your reputation will suffer for it.

When it comes to writing about Magic, Methodology is equally important. If you can’t apply the things you learn about Magic to the community as a whole, then it doesn’t have much use does it? Let’s face it, nobody cares about how deck Foo performs for you when you play it against a bunch of scrubs. The fact that Bobby J can take his teched-out Skirk Prospector/Rorix/Land Destruction deck to the finals of every Friday Night Magic in Sarasota, Florida doesn’t mean a damn thing on a statistical level unless you can do something similar at your FNM. If we cannot replicate the results, then it has to be set aside as a”nice idea,” but not one that has much practical application outside of Sarasota.

Alternately, if you win 80% of the time against any opponent you play, but only win 70% of the time with your new deck, then your new deck is actually worse than what you were playing before. Unfortunately, if you don’t provide that information up front to your readers, they will have little choice but to disregard the evidence you provide as the ranting of a crazy man and move on (not unlike they do with my own work…).

For the purposes of this article, Methodology boils down to how you go about doing analysis of X; where X can be the results of drafts from Grand Prix: Boston, the strength of Goblins in Onslaught Block Constructed, or how good U/W Opposition with Mobilization is in Type 2. I’m going to focus on playtesting today, since that’s what started the argument in the first place and is also the subject that applies to the broadest audience.

I think we can all agree that you will get different results when testing against different scenarios. Fifty games against Kai Budde are going to yield vastly different numbers than fifty games against [author name="Toby Wachter"]Toby Wachter’s[/author] dog (no matter how good Toby’s dog happens to be at playing Psychatog). The irony here is that the results from either opponent may not be reflective of how you can expect to do against the field at large… But that’s a subject for another day.

In addition to the skill of your opponent, other variables that can affect how your reflective your playtesting is against the real world results you can expect include whether a deck is sideboarded, the time it takes to play a game/match, and your own comfort level with the specific decks you (and your playtest partner) are testing.

That’s a lot to consider when all you really want to do is figure out what”the best deck” for a particular environment is. If you go about doing your testing correctly, though, all these factors can be controlled for. The idea behind controlling for certain factors or variables is that you limit the impact that they have on your actual results. Like I said before, playtesting a single deck against Kai is probably going to lead you to believe that the deck you are playing is really bad. However, if you were to play a whole bunch of decks against Kai, it might start to become clear that all the decks appear to be bad, illustrating that perhaps the problem is not with the decks themselves, but with the skill disparity between the players.

Lucky for you that I’m here then, as I’m going to point out ways that you can limit the effect these variables have on your testing, and provide mechanisms to control for certain variables in order to figure out the answers to”What is the best deck?”

Controlling For Skill

Let’s start off with the skill of your playtest partner. Obviously, if you want to obtain results that will be reflective of what you will see when playing an average player, you would playtest against the average player you think you will face in an upcoming tournament. However, since Magic is a competitive endeavor that approach doesn’t make a great deal of sense. In a six-round tournament you will probably be facing players above average after round 3 (assuming you go 3-0). If you can’t beat the above average opponent, then you won’t do very well in rounds 4-6, and you’ll be knocked into the Bean Bracket with some quickness.

Therefore it makes sense to playtest against the best players you can find. Doing this will not only give a good indication as to how a deck will do for the life of a tournament, but it will often improve your own play in the process.

Now assuming that you can find a good player to test against, the next step is to correct for the play skill of the testers. The easiest way to do this is to simply switch decks in the middle of your testing (yes, this assumes that both players are equally comfortable with both decks… But I’ll get to that in a minute). If you are planning to test forty games for a specific matchup, run the first twenty with you playing Deck A and your partner playing Deck B, and run the next twenty with you playing Deck B and your partner playing Deck A.

Record the results of the deck for each player and it will give you some idea as to whether the deck itself is strong, or whether the results are largely skewed by the skill (or lack thereof) of one of the testers.

The Sideboard

This may come as a surprise to some of the more casual players, but potentially 67% of your games will be played after sideboarding. That means that the those fifteen cards that you can add to your deck after Game 1 can have a huge effect on the actual results you will obtain with a deck. However, a lot of the people don’t bother to test matchups post-sideboard, thereby depriving them of 67% of the necessary information they need to know how a deck will do against another deck over the course of a match.

Here’s an example: I’m playing the Windborn Opposition deck I talked about last week against somebody running Astral Slide. Game 1 isn’t particularly great for me, since they have a lot of creature removal, and getting the right mana to Counterspell an early Lightning Rift is a problem, but my Static Orbs still allow me to compete. Game 2, however, is a completely different story. Since I’m running four Ray of Revelation in my sideboard, I now have enough enchantment removal to kill every Slide and Rift that hit the board. Games 2 and 3 suddenly become a much better battle for my deck, and a matchup that looks like a difficult one in Game 1 becomes a relatively easy one over the course of a match.

Also, don’t change sideboard plans in the middle of gathering your results, or you’ll invalidate that testing run. If your initial sideboard plan isn’t doing anything in a particular matchup, you either need to finish that testing run before you attempt to make improvements, or you need to abandon it completely and start over at the beginning of the run. Changing in the middle will muddy your numbers and fail to provide meaningful results.

In order to get solid results from playtesting that are reflective of tournament play, you should play at least 50% of the games post-sideboard. If you don’t, you don’t really have a clue as to how things will play out over an entire match.

Time

This variable matters more than you might think, if only because it can come back to screw you when you least expect it. This year at States, I was playing The Ralphie Treatment against a Burning Bridge deck (which I didn’t expect, but also wasn’t particularly surprised to see). Unfortunately for me, I hadn’t played Ralphie enough at the time to know that, in order not to get a draw after I lost Game 1, I needed to play fast. Speeding bullet fast. Zvi Mowshowitz fast. Therefore I continued to play somewhat deliberately, won Game 2, and had 3 minutes left before time ran out in Game 3 and got saddled with a draw.

If I had playtested the deck more against other control decks (I was mostly worried about whether the deck could win against U/G at the time, since I knew we beat on Tog pretty badly), I would have realized that I needed to pick up the pace. If I had picked up the pace, I might have gotten a win there, avoided a bad matchup the next round, and perhaps finished better than 9^th place.

Anyway, the point is that you can’t necessarily control for Time as a variable, you still need to know how fast or slow a deck plays. If you find yourself hitting the fifty-minute mark a lot when testing against a common matchup, you may need to choose a different deck that isn’t as likely to get your stuck in the draw bracket.

Then again, if you are Trey Van Cleave, you can manage to consistently go into extra time while playing Red Deck Wins, so sometimes it’s not so much the matchup as it is the opponent…

Aside: It’s really too bad Trey is banned again, as he provides a great punchline to almost any joke about Magic.”Hey, you hear the one about the priest, the rabbi, and the Magic player?””Ha ha, yeah… Trey Van Cleave!”

Comfort Level

This one is perhaps the hardest of the variables to control for, because you’re either comfortable playing a deck or you’re not. If you don’t know what you are doing, it will adversely affect your results to the negative, and if your partner is clueless then it will do the opposite.

In a situation like this, it’s usually best to play some practice games to get the feel of a deck before you begin recording the results. Another thing to consider is giving more weight to games that occur towards the end of a testing run. If you haven’t played a deck much and go 2-3 in the first five games with it, but 5-0 in the last five, it’s possible that the earlier results are affected by your comfort level and the later ones are more indicative of the actual quality of the deck.

Then again, when this sort of thing happens it’s probably best just to run another five games and see what they look like. If you are doing meaningful playtesting, then there’s no such thing as too much data.

Useless Complaints Department

I’m a numbers guy, and I’m all for people providing objective evidence for whatever idea they want to argue. In fact, in a community renowned for misinformation and where 90% of the time you will hear”My deck is good for reasons X, Y, and Z” with no proof to back it up, use of numbers is often a refreshing concept.

However, if you write an article with all sorts of silly numbers that make no sense and attempt to use them as proof that your hypothesis is correct, I’m probably going to beat on you for it. These are the sort of lumps that you take for posting your ideas in a public forum for all to see.

On the flip side, if your methodology is solid, I’m not going to have much to complain about. When you state”I got these results in these conditions” and everyone else can replicate those results, you are doing the community a great service, and deserve kudos, applause, and many pats on the back. This eventually leads to”Celebrity Writer” status, which can get you all sorts of things like space on the floor of a hotel room, people walking up to you at tournaments and saying your writing is shiite, and even a membership to the”Your writing is good, so why are you such a bad player?” country club. Let me tell you, I wouldn’t give up these perks for anything!

As a final complaint, the statement”Playtesting is playtesting” is just plain dumb, because it definitely is not. What you get out of your playtesting is entirely dependent on what you put into the testing and the questions you want answers to. In my opinion, unless you are simply trying to improve your skills through playing games, there is good playtesting and there is meaningless playtesting, and very little lies in between.

Going Forward

In the future, if you are going to write an article about a deck that you have tested, you should probably try to incorporate the techniques detailed above into your testing. They may be slightly more time consuming than your normal procedures, but the benefit you will see from getting legitimate results will be enormous.

Along with beginning to control for problem variables with the techniques above, it would also be preferable if writers actually include the methodology of their testing at the beginning of their articles. That way, those of us who care enough to read the article can tell what the numbers provided actually mean.

Conclusion

I’ve outlined most of the problematic variables you have to deal with in playtesting above, and I’ve also tried to provide guidance on how to control these variables so that you can accurately measure the answer to”What is the best deck?” By using the proper methodology, you will not only improve your own test results, but you will also enable yourself to provide useful information to the community if you should write an article.

As always, if you think I missed something or would like to discuss further questions you might have, send me an e-mail or post it in the forums and I will be happy to help.

Until next time, remember that I don’t brag, I mostly boast…

The Holy Knut

mixedknuts@yahoo.com