Power Level Outliers And How To Handle Them In Magic Cubes

Howdy, gamers. With the new year upon us and the current season of Vintage Cube on Magic Online (MTGO) waning, I have found myself in a contemplative mood, spending time curating my own Cube lists and evaluating the fundamentals of my approach to doing so. A common topic in the world of Cube design is that of power level outliers, and today I would like to speak to how to identify power level outliers and how I approach potential outliers in Cube curation.

The Constructed Comparison

As a player who fell in love with Magic through competitive Constructed, I approach power level outliers in Cube similarly to how Wizards of the Coast (WotC) approaches Banned List updates. To back up from that a little, I generally try to avoid including cards in my Cubes that I believe are likely to be power level outliers, much like developers do with Magic sets internally, but inevitably, some slip through.

Once I have a Cube together, I become invested in the balance and net fun of the environment. Many Cube designers and drafters come from more of a Limited background, which I imagine contributes some to the difficulty many Cube owners have in cutting cards from their Cubes. It’s tough to ban cards from Booster Draft, after all. For me, much like for WotC curating their Banned Lists, I see cutting some power level outliers as an unfortunate inevitability in Cube.

A surface-level understanding of the concept of power level outliers means removing cards from your Cube on the basis that they win a disproportionate percentage of games. This notion is, in practice, worthless. It’s just not realistic to draw this kind of conclusion from data you can acquire from actual matches played, and win percentage isn’t the only consideration in this realm either. There’s a lot of wisdom in old Constructed B&R updates that speak to this, which is why I wanted to draw the comparison between managing Constructed formats and Cube curation early.

Now and again you’ll see a handful of cards in Constructed formats referred to as format “pillars”. These cards have massive impacts on a format’s metagame, but in a way considered to be for the better. Formats and metagames ultimately end up shaping around something, so being a format pillar is not inherently problematic, but it can be difficult to understand the difference between a pillar and a problem. What makes Brainstorm a pillar of the Legacy format, while Faithless Looting ultimately fell to the banhammer in Modern? And how can we apply this concept to Cube design?

Dissecting Win Rates

Let’s assume that you record the decklists of every player of every draft of your Cube, and track what they played against and their win rates. This is a ton of work that I imagine close to zero people actually do, but even if you do, it would still take ludicrous amounts of playing to actually get a statistically significant sample size. If we’re lucky, we get to paper Cube once or more a week. How long does it take to then record even 100 matches of data for a given card in Cube Draft? Let’s be real. Conversations about win rates in Cube are often just looking at whether a deck went 2-1 or 3-0 in one draft and arguing what that does or doesn’t mean based on what we already believed.

Let’s say that we were able to set aside everything that makes us human and track the data without bias, and were able to do so long enough to make significant statistical claims. There are still a ton of lurking variables that muddy the waters here. For starters, any Cube deck could realistically drop a match to mana flood or screw. This counts in both directions, but is particularly damning to anybody trying to say that a card can’t be too good because a drafter “didn’t even 3-0”. You also sometimes just don’t draw the card in question.

And then when you do draw the card there’s more going on there, too. I often talk about how Hullbreacher is a rancid card to include in a Vintage Cube, and something very funny happened when the most recent update added the card to the current list. I was watching a friend draft, and his opponent cast and blocked to trade it away at the first opportunity possible. Their life total was high, and my friend controlled Islands. Sometimes players treat their broken cards like they’re Neck Snaps, leaving you to wonder if Hullbreacher would actually be more powerful if it wasn’t able to block. Cube is rarely played for very high stakes, and there’s no reasonable expectation that the games are always expertly navigated.

Quality of deck is also a factor, one impacted by every drafter at the table. This is sort of invisible when it comes to drafting online because you don’t tend to see the decks of other players at your table, but paper Cubers all have the experience of one player colossally train-wrecking their draft and the impact that this has on the rest of the table. The players being passed to immediately by the person who is drafting incoherently tend to either draft something far above or below expectation, with the players exactly opposite them at the table experiencing the reverse of that effect.

But let’s say that we all draft and play perfectly every time. Let’s also say that we draw the potential power level outlier in question every game, too. It’s still the case that most cards have a significant delta in power level regarding the phase of the game you would prefer to draw them. I consider Mother of Runes too powerful for most Cubes, but if you play a lot of games where it appears late, it doesn’t look like much. And how bad of a topdeck is Mox Diamond? Conversely, Inferno Titan tends to look pretty weak in opening hands.

Inferno Titan might be an eyebrow-raising example as a potential power level outlier, but I don’t think it’s all that difficult to imagine a Cube where this is the case. Mostly, I wanted to talk about a six-drop, because that point on the curve introduces additional variables to muddy the water. Magic players tend to have a bias towards crediting the card that actually dealt lethal damage with winning the game. As such, a card like Inferno Titan might read as a power level outlier when powered out by true villains like Grim Monolith and Coalition Relic that allow players to skip the mid-game. It will seem obvious to some that Grim Monolith is a messed-up Magic card, but I’ve seen Sol Ring in enough unpowered Cube lists to make this concept worth mentioning.

This becomes even less approachable when you start to look at actual combos, specifically with regard to cards generally not played beyond the context of the combo. Given the extra hoops involved in drafting a Splinter Twin combo deck, how do you evaluate the performance of Splinter Twin relative to cards with lower barriers to playing them and therefor higher rates of play? Similarly, larger Cube sizes that leave more cards undrafted every time you play can dampen the perceived impact of power level outliers.

I worry that I may be beating a dead horse here, but hopefully I haven’t lost anybody. This is all just to say that power level floors and ceilings both matter, and that context is crucial for making meaningful claims about cards in Cube. This just isn’t an area where you can prove much with game data.

Win rates can tell you something, and indeed it is worth noting if you have a stretch of Cube nights where a particular card or archetype dominates over and over again, but for the most part, you just won’t have meaningful data to tell you which cards in your Cube are power level outliers. That’s why it’s far more important to focus on cards that your players find actively unfun.

Finding the Fun

Magic as a game selects for very analytical players and tends to attract people who like spreadsheets and data. As such, it makes sense that so many conversations about Magic center around statistics and trying to “prove” something. Really, though, when we complain about cards, we are mostly expressing that we don’t find the experience fun. There are cultural reasons why we worry that expressing our opinions is for some reason less valuable than data, but this is Cube, and what people like is both more approachable in practice and more meaningful in this sphere.

This is where some of the most significant Banned List announcements in Magic history come into play. First, I want to talk about Tibalt’s Trickery. Tibalt’s Trickery was banned in Modern as part of a pretty lengthy update otherwise, but the statement on the card itself was the loudest part of the announcement in my mind.

While the overall win rate of the deck hasn’t shown to be problematic, we believe it contributes to non-games that make Modern less fun to play.

This is in stark contrast to many historical bans. I can’t help but compare this to the Standard banning of Stoneforge Mystic and Jace, the Mind Sculptor, which many would argue came into effect long after the damage happened. With Tibalt’s Trickery, they just said, “Yeah, this card just kind of sucks to play against, so you can’t play with it.” And they were right to do so! The bar for removing Uro, Titan of Nature’s Wrath from a Cube shouldn’t be winning every draft with the card every week. It should be identifying that the play experience is undesirable and having an honest conversation about that with your playgroup.

There’s a great story regarding the history of Inverter of Truth in Pioneer as well. In March 2020, Dimir Inverter was cited as only having a 49% win rate in Pioneer Leagues on Magic Online. There was heat on the deck and other assorted combos then, but the data said that they were fine. Then, in August , Inverter of Truth was banned alongside three other combo cards. Kethis combo received a citation as having potentially problematic win rates, though there was no update on Dimir Inverter in this regard. It’s safe to assume that the deck was winning more than the previously reported 49%, certainly so in the hands of Peter Ingram, but this update wasn’t about win percentages.

Although we continue to see many different decks have success in Pioneer, and no decks with problematic win rates against the field, we do see that combo decks as a group make up a large portion of the competitive metagame. We’ve heard feedback that the frequency at which one finds themselves facing an opposing combo deck restricts deck-building options and can make play experiences unenjoyable. While win rate data may not point to change being needed, a different, more important set of data does: player participation.

The long and short of this update is a reflection of combo being more present in Pioneer than the format’s players seemed to desire. This ban was blunt, and it moved the format in a clear direction. I do find aspects of this update and the format’s curation from that point disagreeable, but I will endlessly defend the notion of removing undesirable cards from formats.

What Makes a Card Undesirable?

Ultimately, determining which cards to cut from a Cube is a purely subjective matter. We can talk about win rates and put on our data scientist hats as much as we like, but at the end of the day, we’re just putting together lists of cards that we enjoy playing. You won’t always be able to tell which cards are power level outliers in a given environment, and indeed some power level outliers can be desirable and fun aspects of drafting a particular Cube! That’s kind of the whole idea behind powered Cubes.

There are some more objective ways to approach this matter than others, though. In my experience and from what I’ve seen commonly expressed by other players, cards that on some level fundamentally change the game being played are the power level outliers that we would do best to remove from our Cubes. A loud example of this that comes to mind is when I removed every monarch card from my Pauper Cube. A game of Magic is suddenly different in a way that you can’t really opt out of once the monarchy is introduced.

Commander Masters Downshifts And How They’ll Help Your Pauper Cube

The initiative also gets a lot of attention in this regard, and to a lesser extent The Ring emblem does add a certain texture to games of Magic that feels very different from games not featuring the mechanic. There’s an inherent drawback to claiming the initiative or the monarchy in that your opponent can hoist you by your own petard by successfully attacking you, but again, the win rate is less meaningful than the experience.

Any card with a repeatable effect can have similar impact. Recurring Nightmare looks pretty modest in Vintage Cube today, but try adding it to any lower-powered Cube and you can see how, once it comes down, you’re no longer playing a game of Magic, you’re playing a game of Recurring Nightmare. I had a similar experience with Survival of the Fittest in Spooky Cube, where just having access to every creature in your deck in a lower-powered graveyard-centric environment proved to be entirely too warping to provide with just one card. The aforementioned Uro leads to similar experiences due to its escape ability.

It’s not uncommon to meet a player who has a distaste for planeswalkers as a card type even after all these years. And to be fair to their position, planeswalkers do fundamentally alter games. They make it more important to control creatures that can make good attacks and/or to have removal spells that interact with planeswalkers rather than just creatures. I subscribe to the exact opposite sentiment regarding playing with planeswalkers, but I will offer that Oko, Thief of Crowns is specifically warping in a way where you can’t really get around playing a game of Oko while the card is on the battlefield due to the game-state agnostic way Oko just turns everything into Elk.

I imagine that some readers opened this article hoping for a more empirical approach to identifying power level outliers, and in a way I’m sorry, but in another, more honest way, I hope that those readers walk away with a new perspective regarding personal taste and the significance thereof. There’s a scene in Sam Raimi’s Spider-Man featuring a montage of New Yorkers being asked for their opinions of the titular hero. One opinion offered is that, “He stinks, and I don’t like him.” I often find myself saying this out loud as I cut cards from my Cubes. What more justification could you possibly need?

Cube design isn’t an exercise in demonstrating truths; it’s one of feeling and of expressing taste. If your group enjoys playing with some cards that are clearly more powerful than some of the other cards that they enjoy playing with, then by all means, that’s the experience that you should provide! The customer is always right in matters of taste, and I’ll say it for the thousandth time: your Cube, your rules.