Nate Silver and Forecasting as an Imperfect Science

By Andrew Mooney

For one small subcommunity of America last night, the man who benefited the most from the country’s decisions at the polls was not Barack Obama—it was Nate Silver, statistician and creator of the FiveThirtyEight blog. Based on current election returns, Silver correctly predicted the outcomes of all 50 states, with the result in Florida still pending. Given his track record—he got 49 out of 50 right in 2008—Silver appears to have ushered in a new level of credibility for statistical analysis in politics.

But if Silver has a crystal ball, its surface is still somewhat clouded; in any sort of forecasting, there are elements of uncertainty and margins of error, something Silver notes constantly in his writing. Still, near-perfect results two elections in a row suggest that Silver’s model is particularly powerful, especially considering the confused pundit-blather in the weeks preceding Election Day. Just how unlikely was it that Silver would go 50-for-50?

The best place to turn is Silver’s own projections. Based on state polling data, Silver projected the probability that either Obama or Romney would carry each state. In one sense, much of the work was already done for him; the majority of states were so polarized as to be no-brainers. According to Silver, 38 states had over a 99 percent chance of going to either Obama or Romney, and 44 states were over 90 percent likely to be won by one candidate over the other. Essentially, Silver was faced with the task of calling five or six states in which some significant uncertainty remained.

Now, finding the probability that Silver would go a perfect 50-for-50 isn’t as simple as multiplying all the individual probabilities for each state. That would assume that each state’s polling was independent from that of all of the other states, which doesn’t seem realistic, especially since the same polling companies—YouGov, PPP, etc.—factor into Silver’s analysis for many different states. In fact, Silver was guilty of this same error in a post he authored following the conclusion of the 2011 MLB season, when he attempted to calculate the unlikelihood of the events of the season’s last day.

However, we can look elsewhere in Silver’s analysis for a better answer. On his blog, Silver also provides a histogram representing the probabilities of President Obama winning specific numbers of electoral votes. He lists the odds of Obama winning exactly 332 electoral votes—which, assuming Florida goes to the president, would match Silver’s 50-for-50 prediction—at just over 20 percent. This suggests that Silver was the beneficiary of quite a bit of luck himself; his chances of perfectly predicting every state were four-to-one.

But there may be a better way of evaluating Silver’s predictions than a binary right-wrong analysis. After all, the large number of states that were sure things makes it difficult to determine just exactly how impressive his accomplishment was. To see just how precise Silver’s projections were, it is more instructive to compare the exact percentages he predicted for each state with the actual results from Election Day. Below, I’ve listed these numbers along with the margin of error Silver estimated in his predictions for each state and the amount his projections differed from Tuesday’s returns—the actual margin of error.

Using this methodology, Silver’s record looks a lot less clean. The actual election results in 16 states fell outside the margin of error Silver allotted himself in his projections, reducing his total to 34-for-50, or 68 percent. He was furthest off in Mississippi, which wasn’t nearly as lopsided as he predicted, and West Virginia, which voted more Republican than expected. Of course, Silver was still within two percent on 19 states, an impressive feat in itself.

The takeaway here is that, while Silver’s work the last four years has been impressive, he is not a mysterious wizard—for example, both the Huffington Post and Princeton’s Sam Wang had similarly accurate results. He is also not infallible, and he would be the first to admit it. Forecasting is never an area where we should expect 100 percent accuracy, and though Silver’s work is bringing a lot of positive attention to statistical analysis in general, it’s important that people keep their expectations of its applications realistic.

UPDATE: The graph above actually understates the projected margin of error Silver allows himself by a factor of two. Here is the updated table.

silver2.png

Silver did much better than I gave him credit for initially. Forty-eight out of 50 states actually fell within his margin of error, giving him a success rate of 96 percent. And assuming that his projected margin of error figures represent 95 percent confidence intervals, which it is likely they did, Silver performed just about exactly as well as he would expect to over 50 trials. Wizard, indeed.

About the author

harvardsports

View all posts

15 Comments

  • Are you sure you’ve defined “margin of error” the same way in both columns? If it’s defined like most polls reported in the news, the margin of error in the 538 estimate is the 2-sigma error on each vote share individually, rather than the sum of both differences.

    • I agree. A.C. I believe that deeming projections right or wrong is inappropriate. Rather, we can determine that a likely or unlikely event took place, given the distribution created by Silver. In reality, when the event is deemed a “tail event,” we likely reject the distribution that he assumed. However, we can only reject or fail to reject his projections at certain probability levels. To Andrew Mooney’s credit, his interpretation is how people generally interpret these analyses, even if inappropriate.

  • Note that the most of the states marked “wrong” were not swing states. The more polling done in a state, the easier it is to predict. It’s much easier to predict Ohio & Florida when there were 5 polls each day, while Mississippi might not have had 5 polls during the entire election cycle.

  • Because there are substantial national couplings among states (through both a media system and an economy), deviations from state-to-state can be nontrivially correlated, which means some care is required in evaluating the distribution of deviations.

    Second, as AC Thomas noted, the error in the difference is twice the polling MOE, and if indeed the sixth column needs to be doubled, then by my count only 3 of 50 were missed. You’d have expected 2.5 of 50 outside the MOE (because polling MOE’s are generally 95% CLs), so that suggests the statistics are pretty much correctly calibrated. In fact you could probably do an analysis and limit the amount of state-to-state correlation there could be based on how close these are. (Though I expect the limit is weak.)

  • Another way to look at it is that if the sixth column needs to be doubled, it’s really a +-1 sigma confidence band, i.e. 68%. Which agrees well with what you find–i.e. again suggesting the statistics are well calibrated.

  • It appears that the results for Iowa were flipped in the Percent Dem/Rep columns, which when fixed would lower your calculated MOE for the state to 2.4, a number much more in line with a “battleground” state with lots of polling.

  • Hey guys, thanks for the comments. You’re right about the MOE and Iowa; will update to reflect that later in the day.

  • The obvious question of course, is that if his projected vote percentages have an error that is typically in the range of 3 or 4 % (consistent with many estimates that i have seen), then how does that translate into near perfection is getting the states right?

    i’d also make a second point. A probability is not a prediction. If he says Obama has a 50.3% chance of winning florida, and he wins florida, then he did NOT PREDICT the winner correctly. What happened is that “the outcome was not inconsistent with his probability estimate”. Had Romney won with a 49.7% chance of winning, again the result is consistent with his estimate probability. Of course, that applies to all states which did not have a certain 100% winner predicted (and, he did get one of those wrong in 2008).

  • I agree that Silver should get credit for correctly predicting Florida to be a toss-up state. The “predictions” I’ve imputed to him are simply the most likely results, given the individual state probabilities he assigned. If his model’s point estimate had Florida at a 50.3 percent chance for an Obama win and he was asked to make a prediction regarding Florida, it seems safe to say he would give it to Obama.

  • “The “predictions” I’ve imputed to him are simply the most likely results”

    but that is not right. If his probability (not prediction) states that Romney should win 49.7% of the time, then Romney has to win 48.7% of the time. Not zero. In silver’s state results, the underdog lost all of them, when Silver’s probability model specifically shows that the underdog should have won many of them. Conclusion: his probabilities were wrong.

    To talk about one state is meaningless, it’s like me saying a coin flip is 50/50, and you flip a heads. Do I claim I was right? of course not. The probability did not predict heads. It merely gave you the ratio of heads to tails, if one were to do a large number of flips.

    In looking at the state results from Silver, in 2008 and 2010, we have 100 trials, so we can look at how often the underdog should have won. The underdog didn’t win any (cept the indiana in 2008 case)

    • I know what you’re saying, but in analyses like logistic regression it’s not uncommon to look at specificity and essentially tally how many correct classifications one made. So long as we know what we are doing, it’s also not wrong.

      From another perspective, as you stated, the probabilities suggested that, over the last two terms, more than one of the underdog states probably should have won. Maybe silver is more conservative (closer to 50%) with his estimations? Maybe that’s randomness?

  • John: First, if there are correlations, the underdog losing every single one is possible even when all the individual errors are correctly estimated.

    Second, this statement is, I believe, factually wrong: “In silver’s state results, the underdog lost all of them, when Silver’s probability model specifically shows that the underdog should have won many of them.” The nine battleground states (as described by the networks on election night) were NC, FL, OH, CO, VA, NV, WI, IA, NH. Silver’s probabilities for a Romney win were .74 , .50, .09, .20, .21, 0.07, 0.03, 0.48, 0.15. If I can add, these sum together to 2.47; so your statement should be “when a model much like Silver’s probability model, but neglecting correlations, specifically shows that the underdog should have won 2.47 of them.” The Poisson probability to expect 2.47 and see 1 (NC) or fewer is 29%, which is a very healthy number (it’s effectively a p-value). That 29% will be an even larger number if you include correlations. In short: there is no evidence of statistical miscalibration from these results.

    Andrew: it would be very helpful if your “actual MOE” column were signed (i.e. if it were positive when the model predicted more Obama votes than really occurred, and negative when the model predicted more Romney votes. Or whichever sign convention you prefer.)

Leave a Reply

Your email address will not be published. Required fields are marked *