Followup to my Facebook research

Hey all,

Some of you may have heard what I did this month. It turns out, depending on who you listen to, that I'm either an evil "Facebook hacker" or just some mischievous individual doing "unsettling" research. But, one way or the other, a huge number of people have read or heard this story, and that's pretty cool.

Although it's awesome (and humbling) that so much attention was paid (at least for a couple days) to some fairly straight forward work I did, I want to talk about this from my perspective, including why I did it and what I think this means to the community. Then, for fun, I'll end by talking about other places this research can go and open up the floor for some discussion.

Why I did it

The biggest question I get is: why? -- and it's a valid question. Why would I "expose" public data to the public? And, do I get this excited every year when I get the new phonebook?

Well, let's talk about it!

First off, as many of you know, I'm a developer for the Nmap security scanner. Among many, many other things, I've written several of the bruteforce (aka, 'auth') scripts, which are designed to test password strength on a system. The Ncrack tool, a recent addition to the Nmap suite written by Ithilgore, is primarily designed to test password strength by guessing username/password combinations, much like Hydra and Medusa.

When I joined the Nmap project, it came with a set of 8 or so common usernames and a couple hundred common passwords. The original password list, put together by Kris Katterjohn, was entirely based on some exposed MySpace passwords. Those passwords were sub-optimal because they were phished, not leaked/breached, which means passwords with messages to the phishers, like "fuckyou", are artificially common (not to mention "suck my dick" and "piss off cracker head" -- I highly suggest searching the list for swear words and body parts, it's actually really amusing).

Fortunately for us, as password researchers, there were several more password breaches around that same time. One of the most interesting from a research perspective, due to it being the biggest breach at the time (with 188,000 records), was Phpbb. The Phpbb passwords hashed with md5 so converting them into a useful password list was a long process (that, I'm happy to report, is over 98% done -- not by me).

Not too long after the Phpbb breach, RockYou came along. I'm not going to link to my RockYou list directly, because of the size, but it consists of 32 million plaintext passwords and you can find it on my passwords page. From a password research perspective, we couldn't have asked for better data.

Anyway, with all these breaches, keeping track of the lists became a hassle. So, like anybody who doesn't want to do the work himself, I set up a wiki page to keep track of my lists. Since I created it, I've had exceptionally good feedback about from researchers around the world. As far as I know, it's the best collection of breached passwords anywhere. Nmap's current password list is based on extensive research performed by Nmap developers based on our many lists.

Now, back to the Facebook names. There are actually two sides to the situation. The first, and most obvious, occurs when Nmap (or the other tools I mentioned) are performing a password-guessing audit against a host. Before it can guess a password, the program requires a high-quality list of usernames. Those names could be harvested from the site (such as an email directory), they could be created using default usernames lists (such as 'administrator', 'web', 'user', 'guest', etc), or they could be chosen using lists of actual names (such as 'jsmith' or 'rbowes'). That's where this list comes in -- having a list of 10, 100, or 1000 names wouldn't help us much, because there are billions of people in the world, but having a sample of 170 million names is a great cross-section that gives us great insight into the most common names and, therefore, the most common usernames (who would have thought that 'jsmith' would be the most common?)

The second reason, however, is more interesting to me because it continues my research into how people choose passwords. It's a well known fact to anybody in the security field that people choose poor passwords. By studying the most common trends in password choices, we help teach people how to choose better passwords (and hopefully, someday, we'll find a way to eliminate passwords altogether). I hope to put together some numbers showing how many people use passwords based on names. Although I don't have results that I'm comfortable with releasing yet, I hope to put together some statistics in the future. Stay tuned for that!

Getting out of control

I hope now you have some insight into my motives. It was some simple research, how did it become so popular?

Well, the first reason is because when I wrote the the original blog I posted on the subject, I was somewhat careless with my language. As a result, people got the wrong impression and thought I had a lot more data than I actually did. As you can see in the original story posted by the BBC, the whole situation sounded a lot more exciting, and controversial, than it actually was.

Now, at the time that these stories were running, I wasn't at home. In fact, I on that particular day, I was at the Grand Canyon. Now, why would I post some interesting research the day before I was going to the Grand Canyon? Well, all I can say is that planning ahead is overrated. :)

Anyway, because I was out of town, and Canadian telcos charge ludicrous roaming fees, I wasn't in a hurry to answer phonecalls or spend time on the phone doing an interview. Therefore, despite making attempts to contact me, the reporter from the BBC, Daniel Emery, ended up posting the story as he understood it at the time.

Fortunately, that night, me and Daniel had a great email conversation about the work I did. The result was an updated story that very clearly spells out my motivations and, in my opinion, is one of the best stories on the topic.

By then, though, the damage was done. Hundreds of articles were published. All for something that really wasn't a big deal.

I'm thankful, though, that Facebook's response aligned with mine, and that they didn't make any kind of an attempt to pursue legal action or request that I remove the information or anything else. That's a far better response than I'd expected, to be honest, and I have to thank Facebook for that (even if they didn't invite me to their Defcon party ;) ).

What's this data mean?

So, as I said, I collected exactly two pieces of data:

  1. The names of 170 million users
  2. The URL of those users

I did NOT collect email addresses, friends, private data, public data, or anything else. And the URL might lead to nothing but a name and whatever picture the user chose -- that's what Facebook shares at a minimum. Downloading the actual profile pages of all the users, based on some quick calculations I made, would be about 3tb big. Of course, I don't doubt that somebody is trying. :)

So now, I want to open up the discussion a little. I've been telling reporters (and everybody else) since it started that this data doesn't mean anything, and is only interesting as a research project into common names. My challenge to you, the readers, is: what more can be done with this data?

I've had several email (and real-world) discussions with various people, all of whom will remain unnamed. Some were from businesses, some academia, and some media. Here are some thoughts people have run by me (if you see something that you don't want publicized, please let me know and I'll remove it from this page; I tried to keep these vague enough not to upset anybody, though):

  • A business person suggested that companies who publish names for a living (eg, common baby names) might be interested in this data
  • Other social network sites might want to check overlaps and/or build links between profiles on their site and Facebook
  • In a blog comment, somebody suggested, and is working on, downloading profile pictures for facial recognition
  • On IRC, we discussed the possibility of analyzing the user IDs, included in the URLs, to see if it's possible to enumerate non-searchable accounts
  • A researcher suggested using this data to study the name letter effect, though I haven't collected enough information for that to be useful
  • Similarly, names themselves can be indicative of race/culture -- could this be used for targeted advertising?

So, those are some ideas to expand this research, some of which are actually being worked on right now. And don't get me wrong, those are good ideas, but I'd really like to get some more. Why should we, whether we're security researchers, media, academics, etc, care about having a list of 170,000,000 names and URLs? What can we get from aggregating this data that we didn't have before? What can a good person do with it? What about an evil person?

I'd love to hear most opinions!

24 thoughts on “Followup to my Facebook research

  1. Reply

    Robby

    Great article!

    I agree that adverizers would like to use the list of names.
    One thing it could be used for is marketing toward a group of people with the same name. Or even used by the media for naming characters in a show/movie/radio show/...

    I think if you could get the birthdays for all these users it would be much more valuable data.

  2. Reply

    Josh

    Hey Bowes do the human race a favor & stop the nonsense. U r making alot people upset & its all for attn for you so you can make more money. U dont care a damn about what U did to all those people on FB & you are a pathetic excuse for a human being. I bet you I could hack into your personal accts WITH the protection but I'm not like U. I care about other people & their privacy.
    Why dont U do something useful instead of trying to make a name for yourself using underhanded methods. DO U know HP & Disney & other big co's are using the info to send emails & adsd to FB users, for free I might add? Are you going to tell them to pay us for the right to use that info? U better. I'm going to blog all over the internet not to pay attn to U because you pose a security risk to other people & alot more than just that. we'll dig up whatever info we can on U as well & I do mean WE, meaning millions of us by the time we're done with U, you'll wish U'd not done it.

    1. Reply

      Ron Bowes Post author

      @Josh -- I WISH I made money off this. :)

  3. Reply

    rolodexter

    What’s troubling about this story are the comments that users have posted. There is a problem here, and it has to do with the fact that everybody’s face is on the web, and when you have a username that’s listed in a URL, you’re able to physically identify people. And that’s all you need to get the ball rolling on surveillance. If you wanted a conspiracy theory, you can wonder all day about how it is the United States government got 10% of the world to contribute to its CIA database of personal profiles that isn’t as much about what’s actually listed, but the relationships between what’s listed, what’s publicly available, and what’s kept private. The actual information is almost beside the point; the real gold is in the relationships between the decisions that’s made, the patterns about those decisions that makes for real signatures. Facebook is 500 million users and growing fast. If it’s not the first sole site to hit the 1 billion user mark, it’ll be the next biggest thing, but it’s bound to happen. Is that good or bad? Remember, there is no neutral.

  4. Reply

    4ud1t0r

    @Josh:

    I wholeheartedly agree with you. We need to do something about this! I have started a petition to have Ron removed from the internet. It's called :"Ron I Can Knock you off-line. Ron Only Likes Little Electronic Devices" Please join us! We need people like you, who have such insight into the internets, to join our leadership.

    Go to our petition website and sign-up.

    http://tinyurl.com/2tcnbl

    1. Reply

      Ron Bowes Post author

      @4ud1t0r - Aww, when I clicked the link I was thinking "please be a rickroll!" -- that would have been cooler. Vuvuzela is the next best thing, though! :D

  5. Reply

    4ud1t0r

    @Ron:

    Ya, I thought the traditional rickroll would go right over his head...

    I had to try and find something that was as annoying as his post.

    I'm still not sure if Josh understands the wrath that Disney's N3t N1nj4s will send his way if he tries to get in between them and their free marketing...

    ;-)

    Have a good one.

  6. Reply

    ckn

    For what it's worth, my oppinion is that you did nothing wrong. The internet is full of stupid people anyways so to be honest it's their own damn fault for not beeing carefull about their privacy settings :)

    cheers ron

  7. Reply

    ckn

    sorry for the double post... but why in god's name would there be a HD version of a 10 min. vuvuzela video ? that's just plain sick :P

  8. Reply

    Jacoba

    I don't think you did anything wrong. I find the notion that information on Facebook is private, laughable. Frankly, if I want to keep something private, I do just that. I keep it private. I don't publish it on a public site.
    That said, I have nothing to hide.
    Have fun and even though, I'm not nearly as bright as you in your particular field, I've enjoyed reading what you have to say and will be back for more. Maybe, just maybe, I'll learn something!

    1. Reply

      Ron Bowes Post author

      @Jacoba - I hate the phrase "I have nothing to hide" -- I mean, even if you don't do anything "wrong", as it were, privacy is something that's still important to having a healthy lifestyle. There's a difference between not doing anything wrong and being okay with cameras in your bathroom. There has to be a balance. :)

  9. Reply

    quinametin

    @Josh These people put their information there voluntarily. The knew that it is going to be public (or they didn't read the terms&conditions). It's not Ron that made this info public. It was and it is public. So I do not understand your problem... You don't want to be there? Do not use FB :)

  10. Reply

    Diana

    Ron I think you did a good thing, many people on fb never seem to understand that what they put on there can be seen by everyone. Now they know to hold their personal information closer to them. Thank you.

  11. Reply

    funmaker

    hi thanks for this but it will be great if u let us know how to use the files i have downloaded the torrent but i don't know how to use it i.e. how to see the passwords and ids of the face book users thanks in advance.... And please all others don't take it personal it is for fun and knowledge to make aware of the incidents going on .in the network please tell me how to use it dear ron bowes

  12. Reply

    Matt

    I just read this article, and think it's awesome!

    I didn't even know about the directory. Good thing, too, as I checked and, while some of my family members are searchable, I'm not in the list.

    Now I'll have to check the torrent (and yep, seed too!) to see if I'm there.

    Oh evil Ron. Keep up the good work :D

  13. Reply

    ex

    I was wondering if I could see the script that you used to scrape this data? It seems like a pretty strenuous exercise to look through that many profiles in terms of bandwidth!

    1. Reply

      Ron Bowes Post author

      @ex - It isn't actually too intense of a project, Facebook makes it easy with facebook.com/directory! The script is included in the .torrent file -- there's a .rb and a .nse. If you don't want to download the full torrent, just download those two files.

  14. Reply

    Goteki

    Must say i find the project intressting but the comments are plain stupid. You just compiled a list of names and url's. Anyone who want can do the same, so i dont see the harm in this. So for doing a cool project i say Great work and good luck in the future.

  15. Reply

    Christian

    I download the list and i can't open. Whit what i can read this.
    TY

  16. Reply

    MoonDiver

    Hey Ron! along time i'm teaching about the "risk facebook" at schools. U made everything right!..people don't know what they do with all this social-networks..wake them up!..go ahead..
    greetz, MeX

  17. Reply

    jake smith

    haha roflmao your a criminal lol you cot me crackin up when you said

    Why would I “expose” public data to the public?

  18. Reply

    NullVoid

    I lov nMap.
    great work dude.
    i hav an idea for those data.
    we can catagorised fb useres in many ways.
    like male female youth teenager aged etc.
    we can also find fake fb accounts by doing some reaserch. hmmm doing deep reasreach.
    this data is like u hav whole us or any other countrys citizen in one damn big hall.
    u can imagin what u can do?
    we can also do lost and found services.
    we can find out peoples of same profession or likes to make a faction kinda thing.
    or worst idea i can think is..we will mak one software tht add all fb useres as friend one by one. if possible.
    and next thing we can figure out how many user has max allowed friends.
    and and and.
    we can create somthng like newsletter.
    for that we hav to just create one soft tht sends msg to fb user.
    i hav lot more useless ideas.
    my emailid is gauravgandhi.com@gmail.com
    i am 2nd year bsc it studnts.
    and just started metasploits on back track r2 4.

  19. Reply

    CTS_AE

    Wow @Josh
    That is just a ridiculous uneducated comment
    It makes me want to bash my face in

    It's just funny that what Ron did here is completely natural of so many companies.
    Facebook already has all this data and way much more in their databases, surely they could sell out data, or at least use the data for their own use which they are most definitely doing.
    Even when you goto a store they collect demographics about you.
    Google and many search engines have indexed so much information and cached it, that you should be angry at them, and if you have or even do not have a google account, google already knows so much about you it would be scarey to know how much they did know about you.
    I think this Josh guy is either a troll, flamer, or just so uneducated and mislead about this topic it makes me sick that he could create such a post off of that kind of biased knowledge.

    As for the information, the face recognition idea is awesome yet scary
    it would be cool to just point your cell phone at someone's face and have it tell you who they are
    I bet google could do that since they have google goggles, but I'm sure they will never go to those realms because of such privacy issues.
    But the face recognition can't be done with the information you've collected, like said you just have usernames and urls, which limits what you can do, so I dont see that there's much you can do.

    Lol @ this whole controversy, it was on the news and everything right when I was trying to go through your ASM tutorial and half way through the tutorial the site went down due to this, and #bh was all talkin about how you were taken down and I was like what?! really? no way! It was cool though knowing this site and that you made it on the news, it made me feel special a bit :p like "hey man I use his site all the time and i was trying to learn from it and bam! this happened...."

    Any who keep on keepin on

  20. Reply

    Nightingale

    Hello Ron

    what do yu think today?
    Have you change the mind of the
    facebook User - to take better
    passwords? Is it possible today,
    or are the Users more sensible?

    Have a nice day!

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>