Leaked Password Analysis – 2011-06 Edition
- At June 29, 2011
- By Josh More
- In Business Security
- 4
As most of you likely know, several months ago saw a shift in how a certain type of attack was being done on the Internet. Instead of breaking into a website and simply stealing information, people began breaking into sites to steal information and then release it publicly on the Internet. It is not my intent to discuss the choice of targets or the motivations of these groups. Others have written plenty on this topic and really, if you’re not working for a target or one of the attackers, anything you can say about their motivations is likely to be guesswork at best.
Instead, I want to talk about the passwords. I’ve been following these leaks and collecting password information. My goal is not to break into people’s accounts or to discuss whether or not the leaked data supports the claims of either side. I only have one goal in doing this. I want to find out what I can about people and passwords so I can help everyone choose better ones. So here is my initial analysis. If time permits, I hope to come back to this and do the analysis with more rigor and dive more deeply. However, since my initial rough analysis is done, I wanted to share my preliminary findings. I think they’re interesting, so I hope that you will as well.
Data Set
I’ve combed leaked data for all the cleartext passwords I could fine. Realistically, this means that the passwords I’ve analyzed here fall into two categories. The first category is passwords that were stored unencrypted or very weakly. The second is passwords that were weak to begin with and were easily cracked by those who released the data sets or analyzed them later. So, the important takeaway here is that this is not and analysis of typical passwords used on the Internet. This is an analysis of bad passwords used on the Internet combined with passwords that were stored poorly. Still, since I want to learn what not to do, this seems like a worthy use of my time.
The data set exceeded half a million passwords… but likely involved some duplicate records. I hope to tighten up the analysis in my next go-around.
Common Passwords
Everyone starts these analysis with a list of the most common passwords. I do not wish to disappoint, so here is what I found.
So what can we learn from this? First of all, note the number of passwords that are just numbers. 123456, 12345678, 12345, 111111, 1234, 1234567, and 123456789 were seven of the top 20 bad passwords. This is ridiculous. Who on earth thinks this is a good idea? A lot of people, apparently.
Second, notice the surprisingly large number of people who thought that trustno1, baseball and superman were good choices. Perhaps choosing passwords based on popular culture is unwise.
Password Lengths
I then looked at the average password length. There’s not much of a surprise here, but here’s the graph if you’re interested:
What I found most interesting was how relatively few passwords were seven characters long. I expected six and eight to be large, but not for seven to be so short. Also, note how quickly it drops off after 8. Nine characters and up are ridiculously small.
Keyspaces
This is where things get interesting. We have been talking for years about how people should use a mix of lower case, upper case, numbers and symbols in their passwords. I don’t want to bore you with math, but the reason is that the more characters you have to pick from, the longer it’s going to take to guess the password. If, for example, your password is one character long, if you use a lowercase letter and the attacker tries those first, it will only take 26 tries to get it. If you use a character from any of these sets, it will take 26 (lower case) + 26 (upper case) + 10 (numbers) + 32 (symbols) = 94 tries. If your password is longer, then it will be increasingly harder.
Let’s use a few pictures to make this easier to talk about.
This is what we’d like to think people are doing. We know that not everyone is following our advice, but at a guess, we’d expect there to be a reasonable mix of people doing it the right way and some overlaps within the other spaces.
Our ideal, of course, would be to widen the overlapping space. This way, more people are using more complex passwords and would be safer.
… and this is where we actually are today. The spaces aren’t the same size, which isn’t terribly surprising I guess. However, I didn’t expect not only for the special characters space to be so small, but I also didn’t expect the overlap to be so tiny. In fact, of the 519,229 I analyzed, only 315 had a mix of lower case letters, upper case letters, numbers and special characters. No wonder they got hacked. This means that 0.06% of all the passwords were considered minimally secure.
Really… is it so hard to add an exclamation point or question mark in there somewhere? Here, I’ll even give you some you can use. I mean, really!?!?!?!?!?!?
Other Metrics of Interest
When I compared the list of passwords to itself and weeded out the duplicates, I found that 65.71% of the passwords overlapped. I must say, folks are just not as creative as I had hoped.
For those that follow math, the average entropy score of the password set was 29.63. I hope to make a neat graph comparing entropy to things like length and commonality, but will apparently have to get more proficient with better graphing tools first. My existing tools found graphing 500,000+ data points somewhat challenging. :)
When I ran the list of passwords against the standard Linux word list, I got 85,196 hits out of 178,049 unique passwords. That’s a 47.85% rate of people that aren’t even trying. Again, we’re talking about the easily-cracked passwords, so this number is inflated… but it’s still much too high.
Surprisingly, I did not see many passwords that were just dates. Those stories of people using their kids’ birthdays as passwords seem to have been exaggerated… or perhaps people today don’t care about their kids very much. :)
So What Do We Do?
Given that this was a set of easily broken passwords, the key things to do to prevent your password from being broken is to make them not fit these patterns. This means:
- Use a mix of lower case letters, upper case letters, numbers and special characters. Use at least one of each.
- Make your passwords longer than eight characters. To lay outside of this data set, 10 would be fine. Personally, I’m going up to 16. After all, if you can remember an eight character password, you should be able to remember two of them stuck together.
- Avoid basing your password on popular culture, sequences of numbers (or keys on the keyboard) or sports. Those passwords are much more common than you’d think.
That’s it. If you do these three steps, you’ll be well outside of this data set and therefore, much less likely to get your password stolen. Of course, the one thing I couldn’t measure was how much these passwords are shared between accounts of the same person. The 65.71% overlap rate suggests that there is a lot of this going on, but I can’t prove it. Still, it’d be a good idea not to do that.
Do these suggestions sound familiar? They should. If you’re still not following them, maybe you should. We don’t suggest them to be annoying or to help protect against some amorphous threat in the future. We suggest it because if you don’t follow these rules, you will be hacked.
We’ve just seen it happen.
Over half a million times in the last six months.
Gary Hinson
An better idea is to choose long pass phrases – complete sentences, with capitalization and punctuation. Once people realise that they can use, for example, an entire line of a favourite poem or song, memorising it is no longer the issue (although typing it accurately and quickly with only blobs for visual feedback can be challenging!).
An even better idea, then, is to use your long but memorable pass phrase not directly, but to open a decent password vault which securely stores and effortlessly regurgitates extremely secure and unique generated passwords for every login. Now we’re really getting somewhere! In my experience, the main limitation now relates to web applications that INSIST on us using short, weak passwords (doh!).
Finally, multi-factor authentication is the preferred option for high security logins, using for example the number from a crypto fob or sent through an out-of-band secure channel such as a cellphone. The most common form of identify theft, other than simply guessing those ridiculously weak passwords or hacking insecure database systems, uses keylogging Trojans. Hackers may hijack and exploit a single authenticated session but should not have unfettered access without ongoing access to the second factor. At least that’s the theory, and yes I am making many assumptions about the quality of the application design and programming, and our resistance to social engineering attacks.
Kind regards,
Gary
Josh
Gary,
While what you say is certainly true and matches advice that I have given in the past, I’m a little afraid that you may have missed the point of my post. The point is not to share, yet again, the latest in password choosing advice. It is to analyze specifically-leaked sets of passwords and show the bare minimum necessary to avoid being caught up in one of these leaks. I expect to add length to the mix in the next analysis I do, and that is when I was planning to bring up passphrases and vaults.
The fundamental problem with passphrases is that once people think they have a “good” password, they tend to use that everywhere so a single weak site can breach it and then attackers can extend the attack everywhere. The password vault suggestion is the solution to that… but not one that typically works well at a business level. Multi-factor is not a good solution for the average person as that is an architectural solution and not a decision that the average person can make.
Mark Hagerman
I suspect a lot of people (including myself) use a smaller character set for the sole reason that they don’t want to have to hit the shift key when typing their password. On sites where I can use Password Safe (and its auto-typing feature), I let it generate “really good” passwords; those I have to type manually are restricted to [a-z0-9.].
Kevin Smith
I’m not a big fan of Steve Gibson but Password Padding seems like a good idea.