Investigating Skill in Vintage

Vintage is a format overflowing with powerful decks. While innovation is certainly possible, new decks invariably face a trial by fire as soon as they are suggested: after all, the established players are all adept at winning matches with their long-played creations. Today, Stephen introduces a quantifiable way of measuring playskill, and presents us with the tools to increase our understanding of the true nature of Vintage power.

Deck construction and deck design predominate Vintage strategy articles. As all Magic players are well aware, the deck you play is only half of the equation. Skill is often as important, if not more so in many cases.

Playskill, narrowly defined, is the capacity to make optimal or correct plays. When thinking about skill we typically have two lines of demarcation. We think of better or worse players, and we think of harder and easier decks. In other words, we say that deck A is easy to play (hence requiring less skill) and deck B is harder to play (requiring more skill). These binaries are nice and neat, but they can be downright misleading. The truth is that skill is always relative. The “required” skill is only that necessary to beat your opponent. This article sketches a framework for thinking about skill, and the relevance of skill, that is more helpful than the traditional understandings that exist so far.

The Components of Playskill

In my experience, skillful plays result primarily from two processes: first, forward thinking; and second, pattern recognition. Forward thinking is the process of evaluating decisions by the weighing the consequences of particular plays. Typically, this type of thinking can be put in “If… then… ” terms. For example, “If I play X, then my opponent can do Y.” You examine all of the possibilities and try to weigh the risks. You then choose the play that maximizes your chances of winning. Typically, bad play arises because people don’t evaluate the consequences of particular plays fully. Or, if they do, they do not properly weigh the costs and benefits of a particular line of play and make suboptimal plays.

Forward thinking is necessary at both the tactical and the strategic level. You need to evaluate the consequences of various tactical plays. But you also must evaluate whether your decision-making will create the sort of game state that your deck prefers to see (strategic forward thinking). Thus, forward thinking incorporates both tactical and strategic thinking and requires an approach to Magic that evaluates plays by choosing one play in a decision-tree.

The second process that helps players make optimal plays is pattern recognition. The more a player plays a deck, the more familiar the pilot becomes with various situations that typically arise in the course of a game. The more familiar the player is with those situations, a better feel the player will have for the plays that will lead to a game win. Simply put, if you are a Fish player and you played turn 2 Null Rod instead of Meddling Mage and you find that every time you make that play you win the game, you will be more likely to make that play in the future. This process of pattern recognition actually feeds the first process of forward thinking. The more familiar the pilot becomes with the outcomes of a particular line of play, the better the player will become at weighing the risks of various lines of play and arriving at the correct play. Thus, although forward thinking is important, forward thinking becomes more accurate and more deadly when aided by a great deal of pattern recognition.

Many players say that they play “intuitively.” What this means is that they have so much experience with their deck through pattern recognition that they no longer need to engage in full throttle forward thinking. They rely on pattern recognition more and more. Pattern recognition thus enhances forward thinking by making your evaluative capacity more accurate, but it also can come to substitute for forward thinking and help you play faster.

I put on my robe and wizard hat


I have devised a conceptual measure of playskill that integrates the relationship between individual skill, deck difficulty, and the relative nature of skill. I call it a deck’s Average Skill Level (ASL). If skill is measured by correct or optimal play, then deviation from this ideal, when observed among all players, provides our scale. The more mistakes a player makes, the less skillful they are.

If we sum up the skill level of all players playing a particular deck and divide by the number of players, we have the Average Skill Level of that particular deck. For simplicity’s sake, let’s put skill on a ten-point scale. Perfect play is level 10, and a level 1 would mark the point of greatest deviation from this ideal in your population set. Thus, decks have different average skill levels that fall between 1 and 10. I assert that some decks have a greater or lower ASL. Decks with a lower ASL have a greater vertical distance between the average player and the best. Decks with a higher ASL have a lesser vertical distance between the average player and the best.

I assert that Mana Drain decks in Vintage have a disproportionately high average skill level. There is a very large batch of highly skilled Mana Drain players, not only in the Northeastern United States but also around the world. Skill in Vintage was measured by how good you were with Brian Weissman’s “The Deck” for such a long time, and players seem to have a natural affinity for control, such that the expertise for Mana Drain decks is higher than the expertise for any other archetype. The greater number of these Drain experts means that there will be less of a vertical distance between the best Mana Drain player and the Average Skill Level of a Mana Drain deck than might be the case for, say, Grim Long.

A Pedantic Distinction

I want to make an obvious – but never-mentioned – distinction between the actual physical plays one makes (the cards that one puts onto the table) and the thought process that motivates those plays. As observers, we can’t actually peer inside a player’s brain and read the thought process the leads to one particular play. What we observe is the physical manifestation of those thoughts, measured by the quality of the plays a player makes (indirectly measured by game wins and tournament success).

The reason I make a distinction between the mental process and the physical plays is because there is an asymmetrical relationship between the two. It will be easier to illustrate my point with examples. Take Robert Vroman. Vroman has invented a Vintage deck called “Uba Stax,” and is an expert with it. He has tested every matchup, and has been playing his deck for well over a year. He knows precisely what he needs in almost every situation. He combines extensive pattern recognition with a great deal of forward thinking. Yet, a player could take 10% of the mental steps that Vroman takes and produce 90% of the same plays.

This has important implications for both your testing and your tournament deck choice. Since pattern recognition significantly enhances forward thinking, and is a critical component of optimal playmaking in its own right, one goal should be to minimize your opponent’s ability to use pattern recognition to make optimal plays. This is one reason that, although I adore Mana Drains, I am less inclined to pilot a Mana Drain deck… unless I find exceedingly persuasive reasons for doing so.

Deck Frequency and the ASL

Most decks in Vintage are designed with the metagame in mind. Since some staple cards are so prevalent, deck design reflects choices that are inadvertently aimed at those strategies. Some decks, like Fish, are often aimed directly at a particular deck, like Mana Drain control decks. The particular lock components played in Stax often reflect the anticipated resistance. One set of lock components may be preferable if the metagame were beatdown decks instead of mostly control. Thus, it is more likely that intuitive plays will be optimal plays where you are fighting decks that are prevalent. For example, if you are a Stax player and you open a hand that has Mishra’s Workshop and Tangle Wire, and you make that play, it is more likely to be the correct play if you are playing against the more common control or aggro-control decks than if you are fighting a less common combo deck. Similarly, if you are playing a Fish deck, your intuitive plays are more likely to be correct against a control deck than against a combo deck.

Deck frequency has another important relationship to pattern recognition. If you are in a tournament and you are playing Stax, and you find that turn 1 Smokestack won the game against your first opponent (who was playing Control Slaver), you are more likely to make that same play over another play when you face another Control Slaver deck. Even if that play does not win the game the second time it arises, it will force you to hypothesize why the play worked in one instance and not in another. This insight will provide a basis for guiding future decisions that is more likely to lead to optimal play.

In short, the deck you choose will have an effect on the Average Skill Level of your opponent. The ASL is the skill level of a deck in a tournament when summing all of the players’ skills divided by the number of players. Thus, if three people are playing Gifts in a tournament, the worst player is playing at a level 1, the best player is our level 10, and the middle player defines our average skill level. Say the middle player plays five matches: Control Slaver, Dragon, Fish, Stax, and Control Slaver. Against the Slaver players the middle Gifts player may be an 8, but against Dragon, the Gifts player may be a 4. Overall, the middle Gifts player may be playing at a level 7. The unfamiliarity with the Dragon match results in a weaker game.

This leads to an important subtlety: ASL changes in different matchups. The reasons for this should be clear. Pattern recognition is a key component of skill. If you are playing a deck that is not mainstream, it is less likely that deck design will be as likely to account for your deck (and hence intuitive plays will be less likely to be optimal), or that your opponent will be able to rely on experience to guide their decisions. If you play a mainstream deck, you are effectively raising your opponents’ ASL. In other words, you are more likely to lose to an inferior player.

Deck Design, Testing, and the ASL

One of the most dangerous things I see in Vintage is people picking up a new deck, playing a few games, and then drawing some conclusions about it. Unusual new decks will never play as well in testing as they will when mastered, for reasons that should now be apparent. If your gauntlet partner is playing a deck that they are very good with and they know what you are trying to do, they are much more likely to play the opposing deck at a higher skill level than what you are going to face at tournament. Moreover, you are playing your deck as a much lower skill level then you will when you become more familiar with it. Thus, it is very easy to straw man a deck.

In my entire experience in Vintage, I have rarely seen a deck, when conceived, beat the top tier. Even the one deck that everyone agrees was truly broken, GroAtog, met fierce resistance from traditional Control players to the assertion that GroAtog was the deck to beat. It wasn’t until six months or so of tuning and improving the deck that GroAtog finally proved that it was the better deck. It had to overcome the great resistance of beating the very best and most experienced Keeper players. Even in my online match against Oscar Tan, I only took 50 percent of the games against his Keeper in the Spring of 2003. Decks like the utterly broken Long.dec only took 50 percent from my Team’s Psychatog deck in testing in Fall of 2003. And those were two of the most broken designs to emerge from Vintage in the last five years. Imagine what more modest new concepts have to go through in order to receive a fair hearing, and for them to reach their full potential. I think that Ichorid is a great example of this. The deck is fundamentally unfair, but many dismiss the deck because they can’t achieve X percent wins against this or that deck. They may simply not be playing the deck at a high enough skill level relative to the presumably higher-than-average skill level at which their opponent is piloting the gauntlet deck.

I think that a great deal of the commentary about the quality of decks in Vintage reflects the built up expertise within the format. Thus, judgments about the quality of various decks quite often reflect straw man testing. And as important, many ideas are dismissed before their full potential is tapped. A great number of people saw the inherent brokenness in Ichorid, but dismissed the deck after being unable to get the deck where they wanted it. They gave up too soon. Often the final tweaks can make all the difference. And they overestimate the strength of the opposing deck, since they are likely playing those decks at a higher ASL then you will actually face at tournament in a match of three games.

Deck Choice and ASL

If the ASL changes in different matchups, then what matters for any given matchup is not necessarily how your deck performs when the opponent is playing at a level 10, but how your deck performs when your opponent is playing at the ASL for that deck in this matchup.

Take the Ichorid versus Intuition Tendrils match. Here is a hypothetical scale we can use to illustrate this point. Holding the Ichorid player’s skill constant for the moment:

Fact plus Importance equals NEWS

IT player’s skill / Ichorid’s likelihood of winning
1 —  80%
2 -  70%
3 —  65%
4 — 62%
5 — 59%
6 — 54%
7 — 51%
8 — 48%
9 — 45%
10 — 40%

As the IT players skill increases increases, Ichorid’s chance of winning decreases. But if the average IT player’s skill is often low in this matchup, say level 6 or level 5, then the fact that you lose to the best IT players should not be overly concerning. Some decks are so hard to play that the average playskill with them is going to be low, regardless of the number of players playing them. Thus, decks like Grim Long may be a powerful metagame threat in an objective sense, but the average Grim Long player is typically of no concern. Testing against and thinking about the Grim Long match may quickly become a waste of time. Your time would be better spent thinking about decks that have a high average playskill.

Not only are decks dismissed too early because they do not perform well enough in testing, but design is often affected by this concern. It is very easy to dismiss a deck because it loses to deck A when played at a level 10, but this may not be a relevant concern. If your goal is to win the tournament, then you will probably need to have all of those considerations in mind. But if your goal is simply to top 8, then tweaking and tuning your deck to beat every major deck when played at a level 10 is more than likely going to keep you out of the top 8 by exposing your deck to other vulnerabilities. You will be less likely to survive the Swiss rounds, even if you are more likely to wreck havoc in the top 8. I have never won a Vintage tournament with over 100 players. However, I’ve lost count of the number of times I’ve made top 8. The reason for my success is coupled with the reason that I’ve never taken the top spot at these huge events. People who tend to win tournaments in Vintage play decks with which they are intimately familiar, tweaked for their anticipated metagame. Thus, once they survive the Swiss, they are well equipped to defeat the top 8. I like to build new decks to decrease my opponent’s skill so that I can maximize my chances of making top 8. In other words, I generally play decks that are designed to beat the Swiss, but not as focused on beating the top decks when piloted by the top players. I do not have the mastery and expertise with these decks necessary to take home the top prize. I made multiple play mistakes in my top 8 match with Ichorid at the last major SCG P9 event. These decks are often not quite fully tuned when I take them to tournament. It often takes a large tournament experience to see the remaining design flaws.


Skill is at its apex in Vintage in terms of its importance. Maximizing your skill advantage in a tournament has never been more vital. Testing is absolutely crucial for most players. But it is too easy for new decks to be dismissed before they have a chance to flower, because of the issues outlined in this article. It is also easy for testing to be a weak predictor of tournament results because of the issues discussed here. Our binary conceptions of skill do not provide an adequate framework for guiding our design, testing, and tournament deck choice. The ASL concept integrates individual player skill, deck difficulty, and the relatively quality of skill in a single, useful measure. It helps focus our attention on the things that really matter. It helps us realize that there are many gradations in skill and in how well we pilot our decks. It also helps us see that the decks that should be most concerning are those where the vertical distance between the average playskill and the playskill of the top player is small. In other words, your time is best spent testing against decks with a high average playskill.

Until next time,

Stephen Menendian