## The [95%] Confidence of Nate Silver

The headlines have been buzzing with the “triumph” of statistics and math in this election. But before I jump into how well statistics served us, let’s do a little primer on the margin of error.

Whenever we measure less than the whole population we’ll have some variability in the sample. Chances are good that the sample will not precisely match the entire population. However, we can do some calculations to estimate our confidence in the true parameter and that’s called the margin of error. Most polls and surveys will use a 95% confidence interval to calculate the margin of error. It’s important to know that this isn’t a 95% probability, it is a confidence interval. That is 95% of the time, we expect the true value to be within the margin of error (and yes that’s different then saying the true value has a 95% chance to be within the margin of error). In other words, if we make one hundred predictions with 95% confidence, we would expect that the true value of 5 of those (5%) would be outside our margin of error.

Now we get to Nate Silver’s work. We cannot judge his work by how many states “right” or “wrong” he actually got (though many people are). Instead we have to judge by the accuracy of the forecasts within the confidence intervals. He made 51 forecasts of how the states (and D.C.) would turn out with 95% confidence. Therefore, we cannot expect him to be right 100% of the time, only 95% of the time. If he were right 100% of the time, then there’s something wrong. Instead we would expect (51 forecasts * 5% wrong = ) 2.55 states should be outside of his confidence intervals, or in other words, 2 or 3 election results should fall outside the confidence intervals. But even getting 1 or 4 states may still be expected. However, it’d be worthy of raising an eyebrow if all 51 forecasts fell within the margin of error.

We should expect 2 or 3 states to have election results that fall outside the margin of error. The forecasts would be more inaccurate if all 51 fell within the margin of error.

### So how’d he do?

Pulling data from Nate’s blog (he lists all 51 forecasts on the right side), I was able to make a list. For example, in Alabama, he listed Obama as getting 36.7% of the vote and Romney getting 62.8% with a margin of error of 3.8%. Which means, come election day we expect Obama to get between 32.9%-40.5% of the vote and Romney should get between 59%-66.6 (with 95% confidence).

Next we pull the actual results. I grabbed data from uselections.org and sure enough in Alabama, Obama recieved 38.56% of the vote and Romney got 60.52%. Both fall within the margin of error, congratulations statistics.

When it’s all said and done, **Nate Silver correctly forecasted 48 of the 51 election results and that’s great!** We expected 2.5 states to be outside the margin of error and 3 were. **He could not have been more accurate.** If he had gotten 51 of 51 states correct, the forecasts would be more wrong because these are estimates with 95% confidence.

### What’s that look like?

Combining the two data sources (forecasts with results) we can see that the three states that fell outside the margin of error were Connecticut, Hawaii and West Virginia (marked with asterisks). But looking down the list, we can see how varied the forecasts were and yet how specific they were and how well almost all of them did. Another interesting thing to pick out: in states that were not contested, the margin of error was much larger (since fewer polls were done in those states) so the lines were longer, and in the swing states that margin was considerably smaller. (click image to enlarge)

For those interested, this visualization was done with R and code and source data are both located on github.

Excellent analysis, Jay. Well done!

Getting 51 of 51 would not be more wrong. Still a 7% chance of getting 51 of 51 using accurate .95 CIs. It is just we shouldn’t expect it because it would be rare.

Absolutely and thanks for making that point. It’s like flipping a coin ten times and getting 8 or 9 heads. It’s not wrong or incorrect, just rare as you say.

Interesting analysis. One quibble, though: I would have assumed that the races were correlated to some degree, which would make things more complicated. What I take from this analysis is that either the correlation turned out to be surprisingly weak or Silver’s confidence intervals were off.

“it’d be worthy of raising an eyebrow if all 51 forecasts fell within the margin of error.”

What would be some possible reasons or ways for such an “anomaly” to arise? Why should we be suspicious, other than the fact that it is somewhat unlikely?

Kevin, I struggled with how to word that. Think of flipping a coin ten times, we should expect 5 heads but it wouldn’t be weird to see between 3 and 7 heads either. However, as we get towards either extreme we’d probably have reason “raise an eyebrow” as a I say. It doesn’t mean the coin is unfair or someone is cheating, just that the results were a bit unexpected.

Example: Let’s say I flip a coin ten times and repeat that ten times, and I count the heads for each batch of 10 for these results: (5 1 8 6 6 4 6 6 7 4), I calculate the mean (5.3) and the error with 95% confidence (1.4). I can make the estimate that the true mean is between 3.9 and 6.7 and since I used a 95% confidence we should expect that estimation to match the true parameter 95 out of 100 times. If we deviate from that (too high or too low) we may want to look at