Sorry I haven’t posted in so long. I sort of ran out of time to play Magic for a while over this past Standard. Part of that was a lack of motivation to set aside time for a format in which I wasn’t having as much fun as I did previously. Another part was a number of real-life issues cropping up in my PhD program that required more attention from me. But now I’m back as Amonkhet (AMK) is about to release.

In fact, my LGS holds a final “Top 8” tournament right before tonight’s midnight prerelease in which the top “point earners” compete for the title of “Store Champion” for the current Standard season. Somehow I’m still locked in for Top 8 in this mostly-for-fun tournament (the main prize is getting to choose where you sit for FNM games for all of next Standard and getting your name on a trophy :D) despite missing almost 5 weeks of FNMs. The way you earn points is by getting good records during our four-round FNMs each week. A 4-0 record is worth 5 points, a 3-1 record is worth 3 points, and a 2-2 record is worth 1 point.

Anyways, as I was idling at work today during my office hours I spotted a thread on Reddit discussing deck testing, and it reminded me that I’ve wanted to talk about the danger of blindly comparing win percentages between variants of a deck. Read on to learn more.

My graduate program is not in statistics, but as a researcher I rely a lot on statistics to make arguments about my data and its relevance (or, lately, lack thereof…). In a recent Reddit thread on the Spikes subreddit, people were discussing differences between UB Zombies and BR Zombies win rates (as posted by the OP in the thread). Here is the basis for what turned into a minor statistics argument:

In AER Standard, I played Zombies exclusively in MTGO Competitive Leagues. I started with B/R Zombies. My win rate was 56.43% in 140 matches. I switched to B/U Zombies and found better results (57.81% in 64 matches).

The OP was trying to make an argument that UB zombies was better than RB zombies based on this observed difference. The problem with this is that these two different win rates are attempts at point estimation (trying to use sample data to offer a “best guess” about what the true value – win rates, here – might be in the broader population). But point estimates by themselves are not terribly useful, which is why we also rely on estimates of the likely error of these point estimates when we conduct inferential statistics.

The danger in relying on these point estimates is that it can be easy to see a numeric difference and assume it is meaningful. As was the case in this reddit post, the person relying on these point estimates claims that UB Zombies has performed better (57.81% wins in 64 matches) compared to RB Zombies (56.43%). This is numerically true – 57 is greater than 56. But what is lacking here is an estimate of the error in these point estimates of win rates. Because a match of magic either results in a win or not (we’ll collapse draws and losses here, for simplicity’s sake) these win rates are drawn from a repeated sampling of a binomial distribution.

One of the facts of the binomial distribution is that the variance (or potential error) is largest near the middle of the probability range (~0.5 / ~50%). This means that this minor (1.38%) difference in win rates between the two deck types is less meaningful when the win rates are near 50% than it would be if the win rates were near 0% or 100%. Because of this, before concluding that one deck is “better” than the other on the basis of these point estimates, it would be helpful to compare the two values using inferential statistics to see if the values are “significantly different.” Spoiler alert: they are not.

We can create a 2×2 table of # wins and # total games by deck type to use a chi-squared test to evaluate the difference in these two proportions that differ in sample size. In this case, it turns out that our chi-squared test statistic for these data equals 0.034 with 1 degree of freedom. This results in a probability of 0.8538 that the two observed proportions are drawn from identical populations. In other words, there is a ~14% chance that the two observed win rates actually differ if we were to continue to sample an infinite number of times.

Not only do these two values offered by the original poster not differ enough to allow us to draw conclusions, but if we were to construct a 95% confidence interval (a range of values that we are 95% sure include the TRUE difference in win rates for these two decks in the overall population given the data we have available) that confidence interval would range from -14.07% (UB wins 14% less than RB) to +16.37% (UB wins 16% more than RB). In other words, the data we have available aren’t anywhere close to being precise enough to decide which deck is better.

So be VERY careful when you see people posting numbers about win rates – they often aren’t as informative as you might think!

I’m off to FNM and some prerelease action. Hopefully I’ll find time to write about my prerelease experience later this weekend. I also want to write about my Approach of the Second Sun deck idea but I’d like to test it a little more first – good luck to all of you heading to prerelease events this weekend!