## The [95%] Confidence of Nate Silver

The headlines have been buzzing with the “triumph” of statistics and math in this election. But before I jump into how well statistics served us, let’s do a little primer on the margin of error.

Whenever we measure less than the whole population we’ll have some variability in the sample. Chances are good that the sample will not precisely match the entire population. However, we can do some calculations to estimate our confidence in the true parameter and that’s called the margin of error. Most polls and surveys will use a 95% confidence interval to calculate the margin of error. It’s important to know that this isn’t a 95% probability, it is a confidence interval. That is 95% of the time, we expect the true value to be within the margin of error (and yes that’s different then saying the true value has a 95% chance to be within the margin of error). In other words, if we make one hundred predictions with 95% confidence, we would expect that the true value of 5 of those (5%) would be outside our margin of error.

Now we get to Nate Silver’s work. We cannot judge his work by how many states “right” or “wrong” he actually got (though many people are). Instead we have to judge by the accuracy of the forecasts within the confidence intervals. He made 51 forecasts of how the states (and D.C.) would turn out with 95% confidence. Therefore, we cannot expect him to be right 100% of the time, only 95% of the time. If he were right 100% of the time, then there’s something wrong. Instead we would expect (51 forecasts * 5% wrong = ) 2.55 states should be outside of his confidence intervals, or in other words, 2 or 3 election results should fall outside the confidence intervals. But even getting 1 or 4 states may still be expected. However, it’d be worthy of raising an eyebrow if all 51 forecasts fell within the margin of error.

We should expect 2 or 3 states to have election results that fall outside the margin of error. The forecasts would be more inaccurate if all 51 fell within the margin of error.

### So how’d he do?

Pulling data from Nate’s blog (he lists all 51 forecasts on the right side), I was able to make a list. For example, in Alabama, he listed Obama as getting 36.7% of the vote and Romney getting 62.8% with a margin of error of 3.8%. Which means, come election day we expect Obama to get between 32.9%-40.5% of the vote and Romney should get between 59%-66.6 (with 95% confidence).

Next we pull the actual results. I grabbed data from uselections.org and sure enough in Alabama, Obama recieved 38.56% of the vote and Romney got 60.52%. Both fall within the margin of error, congratulations statistics.

When it’s all said and done, **Nate Silver correctly forecasted 48 of the 51 election results and that’s great!** We expected 2.5 states to be outside the margin of error and 3 were. **He could not have been more accurate.** If he had gotten 51 of 51 states correct, the forecasts would be more wrong because these are estimates with 95% confidence.

### What’s that look like?

Combining the two data sources (forecasts with results) we can see that the three states that fell outside the margin of error were Connecticut, Hawaii and West Virginia (marked with asterisks). But looking down the list, we can see how varied the forecasts were and yet how specific they were and how well almost all of them did. Another interesting thing to pick out: in states that were not contested, the margin of error was much larger (since fewer polls were done in those states) so the lines were longer, and in the swing states that margin was considerably smaller. (click image to enlarge)

For those interested, this visualization was done with R and code and source data are both located on github.

## Visualizing Probability: Roulette

I wrote a post over on the Society of Information Risk Analysts blog and I was having so much fun, I just had to continue. I focused this work on the American version of Roulette, which has “0” and “00” (European version only has “0” producing odds less in favor of the house). The American versions also have “Five Numbers” option to bet, which the European version doesn’t have.

According to this site, the American version of roulette could do about 60-100 spins in an hour, I figured maybe 4 hours in the casino and being conservative, I decided to model 250 iterations of roulette. I then chose $5 bets, which isn’t significant, changing the bet would only change the scale on the left, not the visuals produced. I then ran 20,000 simulations of 250 roulette spins and recorded the loss or gains from the bets along the way. One way to think of this is like watching 20,000 people play 250 spins of roulette and recording and plotting the outcomes.

I present this as a way to understand the probabilities of the different betting options in roulette. I leveraged the names and payout information from fastodds.com. The main graphic represents the progression of the 20,000 players through the spins. Everyone starts at zero and either goes up or down depending on lady luck. The distribution at the end shows the relative frequency of the outcomes.

Enough talking, let’s get to the pictures.

### Betting on a single number

What’s interesting is the patterns forming from the slow steady march of losing money punctuated by large (35 to 1) wins. Notice there would be a few unlucky runs with no wins at all (the red line starts at zero and proceeds straight down to 1250). Also notice in the distribution on the right that just over half of the distribution occurs under zero (the horizontal line). The benefit will always go to the house.

### Betting on a pair of numbers

Same type of pattern, but we see the scale changing, the highs aren’t as high and the lows aren’t as low. None of the 20,000 simulations lost the whole time.

### Betting on Three Numbers

### Betting on Four Number Squares

### Betting on Five Numbers

### Betting on Six Numbers

### Betting on Dozen Numbers or a Column

### Betting on Even/Odd, Red/Green, High/Low

And by this point, when we have 1 to 1 payout odds, the pattern is gone along with the extreme highs and lows.

### Mixing it Up

Because it is possible to simulate most any pattern of betting, I decided to try random betting. During any individual round, the bet would be on any one of the eight possible bets, all for $5. The output isn’t really that surprising.

### Rolling it up into one

Pun intended. While these graphics help us understand the individual strategy, it doesn’t really help us compare between them. In order to do that I created a violin plot (the red line represents the mean across the strategies).

Looking at the red line, they all have about the same mean with the exception of Five Numbers (6-1). Meaning overtime, the gambler should average to just over a 5% loss (or a 7% loss with five number bets). We can see that larger odds stretch the range out, which smaller odds cluster much more around a slight loss. The “scatter” strategy does not improve the outcome and is just a combination of the other distributions. As mentioned, the 6-1 odds (Five Numbers) bet does stick out here as a slightly worse bet than the others.

Lastly, I want to turn back to a comment on the fastodds.com website:

While I may disagree that the only bets to avoid are limited to those (had to get that in), I also disagree with the blanket statement. Since they all lose more often than they win, trying to get less-sucky-odds seems a bit, well, counter-intuitive. I would argue that the bets to avoid are not the same for every gambler. The bets should align with the tolerance of the gambler. For example, if someone is risk-averse, staying with the 2-1 or 1-1 payouts would limit the exposure to loss, while those more risk-seeking, may go for the 17-1 or 35-1 payout – the bigger the risk, the bigger the reward. Another thing to consider is that the smaller odds win more often. If the thrill of winning is important, perhaps staying away from the bigger odds is a good strategy.

Now that you’re armed with this information, if you still have questions, the Roulette Guru is available to advise based on his years of experience.

## AES on the iPhone isn’t broken by Default

I wanted to title this “CBC mode in the AES implementation on the iPhone isn’t as effective as it could be” but that was a bit too long. Bob Rudis forwarded this post, “AES on the iPhone is broken by Default” to me via twitter this morning and I wanted to write up something quick on it because I responded “premise is faulty in that write up” and this ain’t gonna fit in 140 characters. Here is the premise I’m talking about:

In order for CBC mode to perform securely, the IV must remain impossible for the attacker to derive or predict.

This isn’t correct. In order for CBC mode to be ** effective** the initialization vector (IV) should be unique (i.e. random), preferably per individual use. To correct the statement:

**In order for AES to perform securely, the key must remain impossible for the attacker to derive or predict.**There is nothing in the post that makes the claim that the key is exposed or somehow able to be derived or predicted.

Here’s the thing about IV’s: they are not secret. If there is a requirement to keep an IV secret, the cryptosystem is either designed wrong or has some funky restrictions and I’ll be honest, I’ve seen both. In fact, the IV is so *not secret*, the IV can be passed in the clear (unprotected) along with the encrypted message and it does not weaken the implementation. Along those lines, there are critical cryptosystems in use that don’t use IV’s. For example, some financial systems leverage ECB mode which doesn’t use an IV (and has it’s own problems). Even a bad implementation of CBC is better than ECB. Keep that in mind because apparently ECB is good enough for much of the worlds lifeblood.

So what’s the real damage here? As I said in order for CBC to be effective, the IV should not be reused. If it is reused (as it appears to be on the implementation in the iPhone Nadim wrote about), we get a case where an attacker may start to pull out patterns from the first block. Which means if the first block contains repeatable patterns across multiple messages, it may be possible to detect that repetition and infer some type of meaning. For example, if the message started out with the name of the sender, a pattern could emerge (across multiple encrypted messages using the same key and IV) in the first block that may enable some inference as to the sender on that particular message.

Overall, I think the claim of “AES is broken on the iPhone” is a bit overblown, but it’s up to the interpretation of “broken”. If I were to rate this finding on a risk scale from “meh” to “sky is falling”, off the cuff I’d say it was more towards “meh”. I’d appreciate this fixed from apple at some point… that is, if they get around to it and can squeeze it in so it doesn’t affect when I can get an iPhone 5… I’d totally apply that patch. But I certainly wouldn’t chuck my phone in the river over this.

## A Call to Arms: It is Time to Learn Like Experts

I had an article published in the November issue of the ISSA journal by the same name as this blog post. I’ve got permission to post it to a personal webpage, so it is now available here.

The article begins with a quote:

When we take action on the basis of an [untested] belief, we destroy the chance to discover whether that belief is appropriate. – Robin M. Hogarth

That quote from his book, “Educating Intuition” and it really caught the essence of what I see as the struggles in information security. We are making security decisions based on what we believe and then we move onto the Next Big Thing without seeking adequate feedback. This article is an attempt to say that whatever you think of the “quant” side of information security needs to be compared to the what we have without quants – which is an intuitive approach. What I’ve found in preparing for this article is that the environment we work in is not conducive to developing a trustworthy intuition on its own. As a result, we have justification in challenging unaided opinion when it comes to risk-based decisions and we should be building feedback loops into our environment.

Have a read. And by all means, feedback is not only sought, it is required.

## The Simple Power of OpenPERT: ALE 2.0

Yup, Chris Hayes (and I) have released the 1.0 version of OpenPERT. I had a sneaking suspicion that most people would do what I did with my first excel add-in. “Okay, I installed it, now what?” then perhaps skim some document on it, type in a few formulas and walk away thinking about lunch or something. In an attempt to minimize OpenPERT being *that* add-in, we created something to play with – the next generation of the infamous ALE model. We call in “ALE 2.0”

### About ALE

ALE stands for Annualized Loss Expectancy and it is taught (or was taught back in my day) to everyone studying for the CISSP exam. The concept is easy enough: estimate the annual rate of occurrence (ARO) of an event and multiply it by the single loss expectancy (SLE) of an event. The output of that is the annualize loss expectancy or ALE. Typically people are instructed to use single point estimations and if this method has ever been used in practice it’s generally used with worst-case numbers or perhaps an average. Either way, you end up with a single number that, while precise, will most likely not line up with reality no matter how much effort you put into your estimations.

### Enter the next generation of ALE

ALE 2.0 leverages the BETA distribution with PERT estimates and runs a Monte Carlo simulation. While that sounds really fancy and perhaps a bit daunting, it’s really quite simple and most people should be able to understand the logic by digging into the ALE 2.0 example.

Let’s walk through ALE 2.0 by going through a case together: let’s estimate the annual cost of handling virus infections in some made-up mid-sized company.

For the annual rate of occurrence, we think we get around 2 virus infections a month on average, some months there aren’t any, and every few years we get an outbreak infecting a few hundred. Let’s put that in terms of a PERT estimate. At a minimum (in a good year), we’d expect maybe 12 a year, most likely there are 30 per year and bad years we could see 260 outbreaks.

For the single loss expectancy, we may see nothing, the anti-virus picks it up and cleans it automatically and there’s no loss. Most likely, we spend 30 minutes to manually thump it, worst case we do system rebuilds, taking 2 hours, non-dedicated time but there are some other overhead tasks. Putting that in terms of money, we may say minimum is $0, most likely, oh $50 of time, worst case, $200.

Now let’s hit the button. Some magic happens and out pops the output:

There were a couple of simulations with no loss (min value) from responding to viruses given those inputs, on average there was about $4,400 in annualized loss and there were some really bad years (in this case) going up to around $35k. There are some statements to make from this as well, like "10% of the ALE simulations exceeded $9,000.”

But the numbers, or even ALE for that matter aren’t what this is about, it’s about understanding what OpenPERT can do.

### What’s going on here?

Swap over to the “Supporting Data” tab in Excel, that’s where the magic happens. Starting in cell A2, we call the OPERT() function with the values entered in for ARO. The OPERT function takes in the minimum, most likely, maximum values and an optional confidence value. In the cell, the function returns a random value based on a beta distribution of values. This ARO calculation is repeated for 5000 rows in column A, that’s the Monte Carlo portion. Column B has all of the SLE calculations (OPERT function calls in SLE estimations) for 5000 simulations and column C is just the ARO multiplied by the SLE (the ALE for the simulation).

In summary, this leverages the OPERT() function to returns a single instance for the two input estimations (ARO and SLE) and we repeat that to get a large enough sample size (and 5000 is generally a large enough sample size, especially for this type of broad-stroke ALE technique).

Also, if you’re curious the table to the right of column C is the data used to construct that pretty graph on the first tab (show above).

### Next Steps

The ALE method itself may have limited application, but it’s the thought process behind it that’s important. The combination of PERT estimations with the beta distribution to feed into Monte Carlo simulations is what makes this approach better than point estimates. This could be used for a multitude of applications, say for estimating the piano tuners in Chicago or any number of broad estimations, some of them even related to risk analysis hopefully. We’re already working on the next version of OpenPERT too, in which we’re going to integrate Monte Carlo simulations and a few other features.

It would be great to build out some more examples. Can you think of more ways to leverage OpenPERT? Having problems getting it work? Let us know, please!

## Risk Analysis is a Voyage

OWASP Risk Rating Methodology. If you haven’t read about this methodology, I highly encourage that you do. There is a lot of material there to talk and think about.

To be completely honest, my first reaction is “what the fudge-cake is this crud?” It symbolizes most every challenge I think we face with information security risk analysis methods. However, my pragmatic side steps in and tries to answer a simple question, “Is it helpful?” Because the one thing I know for certain is the value of risk analysis is relative and on a continuum ranging from really harmful to really helpful. Compared to unaided opinion, this method may provide a better result and should be leveraged. Compared to anything else from current (non-infosec) literature and experts, this method is sucking on crayons in the corner. But the truth is, I don’t know if this method is helpful or not. Even if I did have an answer I’d probably be wrong since its value is relative to the other tools and resources available in any specific situation.

But here’s another reason I struggle, risk analysis isn’t easy. I’ve been researching risk analysis methods for years now and I feel like I’m just beginning to scratch the surface – the more I learn, the more I learn I don’t know. It seems that trying to make a “one-size fits all” approach always falls short of expectations, perhaps this point is better made by David Vose:

I’ve done my best to reverse the tendency to be formulaic. My argument is that in 19 years we have never done the same risk analysis twice: every one has its individual peculiarities. Yet the tendency seems to be the reverse: I trained over a hundred consultants in one of the big four management consultancy firms in business risk modeling techniques, and they decided that, to ensure that they could maintain consistency, they would keep it simple and essentially fill in a template of three-point estimates with some correlation. I can see their point – if every risk analyst developed a fancy and highly individual model it would be impossible to ensure any quality standard. The problem is, of course, that the standard they will maintain is very low. Risk analysis should not be a packaged commodity but a voyage of reasoned thinking leading to the best possible decision at the time.

-David Vose, “Risk Analysis: A Quantitative Guide”

So here’s the question I’m thinking about, without requiring every developer or infosec practitioner to become experts in analytic techniques, how can we raise the quality of risk-informed decisions?

Let’s think of the OWASP Risk Rating Methodology as a model, because, well, it is a model. Next, let’s consider the famous George Box quote, “All models are wrong, but some models are useful.” All models have to simplify reality at some level (thus never perfectly represent reality) so I don’t want to simply tear apart this risk analysis model because I can point out how it’s wrong. Anyone with a background in statistics or analytics can point out the flaws. What I want to understand is how *useful* the model is, and perhaps in doing that, we can start to determine a path to make this type of formulaic risk analysis *more useful*.

Risk Analysis is a voyage, let’s get going.

## Yay! We Have Value, Part 2

There are very few things more valuable to me than someone constructively challenging my thoughts. I have no illusions thinking I’m right and I’m fully aware that there is always room for improvement in everything. That’s why I’m excited that lonervamp wrote up “embrace the value, any value, you can find” providing some interesting challenges to my previous post on “Yay! we have value now!”

Overall, I’d like to think we’re more in agreement than not, but I was struck by this quote:

Truly, we will actually never get anywhere if we don’t get business leaders to say, "We were wrong," or "We need guidance." These are the same results as, "I told ya so," but a little more positive, if you ask me. But if leaders aren’t going to ever admit this, then we’re not going to get a chance to be better, so I’d say let ’em fall over.

Crazy thought here… What if they aren’t wrong? What if security folks are wrong? I’m not going to back that up with anything yet. But just stop and think for a moment, what if the decision makers have a better grasp on expected loss from security breaches than security people? What would that situation look like? What data would we expect to find to make them right and security people wrong? Why do some security people find some pleasure when large breaches occur? Stop and picture those for a while.

I don’t think anyone would say it’s that black and white and I don’t think there is a clear right or wrong here, but I thought I’d attempt to shift perspectives there, see if we could try on someone else’s shoes. I tend to think that hands down, security people can describe the failings of security way better than any business person. However, and this is important, that’s not what matters to the business. I know that may be a bit counter-intuitive, our computer systems are compromised by the bits and bytes. The people with the best understanding of those are the security people, how can they not be completely right in defining what’s important? I’m not sure I can explain it, but that mentality is represented in the post that started this discussion. This sounds odd, but perhaps security practitioners know too much. Ask any security professional to identify al the ways the company could be shut down by attackers and it’d probably be hard to get them to stop. Now figure out how many companies have experienced losses anything close to those and we’ve got a very, very short list. That is probably the disconnect.

Let me try and rephrase that, while security people are shouting that our windows are susceptible to bricks being thrown by anyone with an arm (which is true), leaders are looking at how often bricks are thrown and the expected loss from it (which isn’t equal to the shouting and also true). That disconnect makes security people lose credibility (“it’s partly cloudy, why are they saying there’s a tornado?”) and vice versa (“But Sony!”). I go back to neither side is entirely wrong, but we can’t be asking leadership to admit they’re wrong without some serious introspection first.

I’d like to clarify my point #3 too. Ask the question: how many hack-worthy targets are there? Whether explicit or not, everyone has answered this in there head, most everyone is probably off (including me). When we see poster children like RSA, Sony, HBGary and so on. We have to ask ourselves how likely is it that we are next? There are a bazillion variables in that question, but let’s just consider it as a random event (which is false, but the exercise offers some perspective). First, we have to picture “out of how many?” Definitely not more than 200 Million (registered domain names), and given there are 5 Million U.S. companies (1.1 Million making over 1M, 7,500 making over 250M), can we take a stab at how many hack-worthy targets there are in the world? 10 thousand? Half a million? Whatever that figure is, compare it to the number of seriously impactful breaches in a year. 1? 5? 20? 30? Whatever you estimate here, it’s a small, tiny number. Let’s take worst case of 30/7,500 (max breaches over min hack-worthy) that comes out to a 1 in 250 chance. That’s about the same chance a white person in the US will die of myeloma or that a U.S. female will die of brain cancer. It might even be safe to say that in any company, female employees will die of brain cancer more often than a major/impactful security breach will occur. Weird thought, but that’s the fun of reference data points and quick calculations.

This is totally back-of-the-napkin stuff, but people do these calculations without reference data and in their head. Generally people are way off on these estimations. It’s partly why we think Sony is more applicable than it probably is (and why people buy lottery tickets). The analogy LonerVamp made about the break-ins in the neighborhood doesn’t really work, it puts the denominator too small in our heads. Neighborhoods are pictured, I’d guess as a few dozen, maybe 100 homes max, and makes us think we’re much more likely to be the next target. Perhaps we could say, “imagine you live in a neighborhood of 10,000 houses and one of them was broken into…” (or whatever the estimate of hack-worthy targets is).

I bet there’s an interesting statistic in there, that 63% percent of companies think they are in the top quarter of prime hack-worthy targets. (yeah, made that up, perhaps there’s some variation of the Dunning-Kruger effect for illusory hack-worthiness). Anyway, I’m cutting the rest of my points for the sake of readability. I’d love to continue this discussion and I hope I didn’t insult lonervamp (or anyone else) in this discussion, that isn’t my intent. I’m trying to state my view of the world and hope that others can point me in whatever direction makes more sense.