Leaked Password Analysis – 2011-06 Edition

As most of you likely know, several months ago saw a shift in how a certain type of attack was being done on the Internet. Instead of breaking into a website and simply stealing information, people began breaking into sites to steal information and then release it publicly on the Internet. It is not my intent to discuss the choice of targets or the motivations of these groups. Others have written plenty on this topic and really, if you’re not working for a target or one of the attackers, anything you can say about their motivations is likely to be guesswork at best.

Instead, I want to talk about the passwords. I’ve been following these leaks and collecting password information. My goal is not to break into people’s accounts or to discuss whether or not the leaked data supports the claims of either side. I only have one goal in doing this. I want to find out what I can about people and passwords so I can help everyone choose better ones. So here is my initial analysis. If time permits, I hope to come back to this and do the analysis with more rigor and dive more deeply. However, since my initial rough analysis is done, I wanted to share my preliminary findings. I think they’re interesting, so I hope that you will as well.

Data Set

I’ve combed leaked data for all the cleartext passwords I could fine. Realistically, this means that the passwords I’ve analyzed here fall into two categories. The first category is passwords that were stored unencrypted or very weakly. The second is passwords that were weak to begin with and were easily cracked by those who released the data sets or analyzed them later. So, the important takeaway here is that this is not and analysis of typical passwords used on the Internet. This is an analysis of bad passwords used on the Internet combined with passwords that were stored poorly. Still, since I want to learn what not to do, this seems like a worthy use of my time.

The data set exceeded half a million passwords… but likely involved some duplicate records. I hope to tighten up the analysis in my next go-around.

Common Passwords

Everyone starts these analysis with a list of the most common passwords. I do not wish to disappoint, so here is what I found.

So what can we learn from this? First of all, note the number of passwords that are just numbers. 123456, 12345678, 12345, 111111, 1234, 1234567, and 123456789 were seven of the top 20 bad passwords. This is ridiculous. Who on earth thinks this is a good idea? A lot of people, apparently.

Second, notice the surprisingly large number of people who thought that trustno1, baseball and superman were good choices. Perhaps choosing passwords based on popular culture is unwise.

Password Lengths

I then looked at the average password length. There’s not much of a surprise here, but here’s the graph if you’re interested:

What I found most interesting was how relatively few passwords were seven characters long. I expected six and eight to be large, but not for seven to be so short. Also, note how quickly it drops off after 8. Nine characters and up are ridiculously small.

Keyspaces

This is where things get interesting. We have been talking for years about how people should use a mix of lower case, upper case, numbers and symbols in their passwords. I don’t want to bore you with math, but the reason is that the more characters you have to pick from, the longer it’s going to take to guess the password. If, for example, your password is one character long, if you use a lowercase letter and the attacker tries those first, it will only take 26 tries to get it. If you use a character from any of these sets, it will take 26 (lower case) + 26 (upper case) + 10 (numbers) + 32 (symbols) = 94 tries. If your password is longer, then it will be increasingly harder.

Let’s use a few pictures to make this easier to talk about.

This is what we’d like to think people are doing. We know that not everyone is following our advice, but at a guess, we’d expect there to be a reasonable mix of people doing it the right way and some overlaps within the other spaces.

Our ideal, of course, would be to widen the overlapping space. This way, more people are using more complex passwords and would be safer.

… and this is where we actually are today. The spaces aren’t the same size, which isn’t terribly surprising I guess. However, I didn’t expect not only for the special characters space to be so small, but I also didn’t expect the overlap to be so tiny. In fact, of the 519,229 I analyzed, only 315 had a mix of lower case letters, upper case letters, numbers and special characters. No wonder they got hacked. This means that 0.06% of all the passwords were considered minimally secure.

Really… is it so hard to add an exclamation point or question mark in there somewhere? Here, I’ll even give you some you can use. I mean, really!?!?!?!?!?!?

Other Metrics of Interest

When I compared the list of passwords to itself and weeded out the duplicates, I found that 65.71% of the passwords overlapped. I must say, folks are just not as creative as I had hoped.

For those that follow math, the average entropy score of the password set was 29.63. I hope to make a neat graph comparing entropy to things like length and commonality, but will apparently have to get more proficient with better graphing tools first. My existing tools found graphing 500,000+ data points somewhat challenging. :)

When I ran the list of passwords against the standard Linux word list, I got 85,196 hits out of 178,049 unique passwords. That’s a 47.85% rate of people that aren’t even trying. Again, we’re talking about the easily-cracked passwords, so this number is inflated… but it’s still much too high.

Surprisingly, I did not see many passwords that were just dates. Those stories of people using their kids’ birthdays as passwords seem to have been exaggerated… or perhaps people today don’t care about their kids very much. :)

So What Do We Do?

Given that this was a set of easily broken passwords, the key things to do to prevent your password from being broken is to make them not fit these patterns. This means:

Use a mix of lower case letters, upper case letters, numbers and special characters. Use at least one of each.
Make your passwords longer than eight characters. To lay outside of this data set, 10 would be fine. Personally, I’m going up to 16. After all, if you can remember an eight character password, you should be able to remember two of them stuck together.
Avoid basing your password on popular culture, sequences of numbers (or keys on the keyboard) or sports. Those passwords are much more common than you’d think.

That’s it. If you do these three steps, you’ll be well outside of this data set and therefore, much less likely to get your password stolen. Of course, the one thing I couldn’t measure was how much these passwords are shared between accounts of the same person. The 65.71% overlap rate suggests that there is a lot of this going on, but I can’t prove it. Still, it’d be a good idea not to do that.

Do these suggestions sound familiar? They should. If you’re still not following them, maybe you should. We don’t suggest them to be annoying or to help protect against some amorphous threat in the future. We suggest it because if you don’t follow these rules, you will be hacked.

We’ve just seen it happen.

Over half a million times in the last six months.

Cuttlefish

I know, I know. The security and squid blog is located elsewhere. Sorry, but I just have to write about this article.

A short time ago, Chuan-Chin Chiao, Kenneth Wickiser, Justine J. Allen, Brock Genter and Roger T. Hanlon published the article Hyperspectral imaging of cuttlefish camouflage indicates good color match in the eyes of fish predators. (How can you resist an article with such a fascinating title?) For those that don’t thrill to reading academic articles about the eyes of coleoid cephalopods (you weirdos), there is a more accessible press release here.

Why am I fascinated about this? Well, cuttlefish have the ability to change their patterning to blending into the background. We’re familiar with how chameleons do this, but cuttlefish are a lot better at it. Not only are they better at it, but they’re also colorblind! (Like me.) That’s right, these critters are capable of changing their own coloration when they can’t even see it. How do they do it? Well, sorry to keep you in suspense, but we still don’t know. There is some suspicion that it involves opsin transcripts, and evidence that body position may have something to do with it, but those theories are insufficient for complete explanation. What’s interesting is the approach of the paper.

Science, as you know, is all about measurement. There’s little room for guesswork and lots of opportunity to be wrong. So if you’re going to measure camouflage, you’d better have a darn good way of doing it. What these guys did was to take hyperspectral images with a HyperScan VNIR system. Effectively, it measured the different amounts of 540 different colors to determine how well the cuttlefish blended in to their background. They looked at their targets as if they were a super predator, with capabilities far beyond that of the predators we know… and the cuttlefish’s technique was still effective.

So what does this mean for us? Well, for me it means that I lost out, as I am colorblind, but aren’t able to perceive the polarization of light like cuttlefish can (lucky critters). However, for the rest of us as a group, it means this:

These creatures developed this ability over millions of years through a complex process of trying different ways to hide and, when they failed, being eaten. From a business perspective, there is some value in failing fast… but little advantage in being eaten. If you want to develop strong protections, you need to find a predator that lets you know when your defense is working and when it’s not, without eating you. Ideally, this would be a super-predator that is better than most of the predators out there.

We call these people penetration testers. Armed not with a HyperScan VNIR, but with tools like network mappers, vulnerability scanners and exploit frameworks, these people can assess your business and let you know if they could break through your defenses and how. You can then protect yourself better by making appropriate changes. Sadly, the industry is still young, and it’s hard to identify the super predators from the others. There is a project to help with this, but for now, here’s a quick evaluation process. When you call a company (like mine) and ask for an evaluation, ask this handful of questions:

How much will a penetration test cost?
How much will a vulnerability assessment cost?
- Rule of thumb: Due to the time involved, penetration tests cost at least ten times when vulnerability assessments do. If they don’t, find another company.
What is the difference between a penetration test and a vulnerability assessment?
- Rule of thumb: If they only say “A penetration test tries to break in, a vulnerability assessment does not”, find another company.
What is your assessment methodology?
- Here, you should be looking for a standard and repeatable process. You don’t need to dig into the weeds, but you do want to weed out companies that come across as “We just try stuff at random”.
What problems have your tests caused in the past?
- Here’s a secret of the industry. Anyone worth their salt has broken something. If you don’t sometimes break stuff, you’re not trying hard enough. Companies that try to gloss over this and say “Oh, our tests are safe” are not super-predators.

Get the right help or get eaten.

It’s that simple.

Leaked Password Analysis – 2011-06 Edition

Cuttlefish

Recent Posts

Archives

Categories

Copyright © 2013 by Josh More