Return of the Facebook Snatchers

First and foremost: if you want to cut to the chase, just download the torrent. If you want the full story, please read on....

Background

Way back when I worked at Symantec, my friend Nick wrote a blog that caused a little bit of trouble for us: Attack of the Facebook Snatchers. I was blog editor at the time, and I went through the usual sign off process and, eventually, published it. Facebook was none too happy, but we fought for it and, in the end, we got to leave the blog up in its original form.

Why do I bring this up? Well last week @FSLabsAdvisor wrote an interesting Tweet: it turns out, by heading to https://www.facebook.com/directory, you can get a list of every searchable user on all of Facebook!

My first idea was simple: spider the lists, generate first-initial-last-name (and similar) lists, then hand them over to @Ithilgore to use in Nmap's awesome new bruteforce tool he's working on, Ncrack.

But as I thought more about it, and talked to other people, I realized that this is a scary privacy issue. I can find the name of pretty much every person on Facebook. Facebook helpfully informs you that "[a]nyone can opt out of appearing here by changing their Search privacy settings" -- but that doesn't help much anymore considering I already have them all (and you will too, when you download the torrent). Suckers!

Once I have the name and URL of a user, I can view, by default, their picture, friends, information about them, and some other details. If the user has set their privacy higher, at the very least I can view their name and picture. So, if any searchable user has friends that are non-searchable, those friends just opted into being searched, like it or not! Oops :)

The lists

Which brings me to the next topic: the list! I wrote a quick Ruby script (which has since become a more involved Nmap Script that I haven't used for harvesting yet) that I used to download the full directory. I should warn you that it isn't exactly the most user friendly interface -- I wrote it for myself, primarily, I'm only linking to it for reference. I don't really suggest you try to recreate my spidering. It's a waste of several hundred gigs of bandwidth.

The results were spectacular. 171 million names (100 million unique). My original plan was to use this list to generate a list of the top usernames (based on first initial last name):

 129369 jsmith
  79365 ssmith
  77713 skhan
  75561 msmith
  74575 skumar
  72467 csmith
  71791 asmith
  67786 jjohnson
  66693 dsmith
  66431 akhan

Or first name last initial:

 100225 johns
  97676 johnm
  97310 michaelm
  93386 michaels
  88978 davids
  85481 michaelb
  84824 davidm
  82677 davidb
  81500 johnb
  77800 michaelc

Or even the top usernames based on first name dot last name (sorry, I can't link this one due to bandwidth concerns; but it's included in the torrent):

  17204 john.smith
   7440 david.smith
   7200 michael.smith
   6784 chris.smith
   6371 mike.smith
   6149 arun.kumar
   5980 james.smith
   5939 amit.kumar
   5926 imran.khan
   5861 jason.smith

Or even the most common first or last names:

 977014 michael
 963693 john
 924816 david
 819879 chris
 640957 mike
 602088 james
 584438 mark
 515686 jason
 503658 robert
 484403 jessica

 913465 smith
 571819 johnson
 512312 jones
 503266 williams
 471390 brown
 386764 lee
 360010 khan
 355639 singh
 343220 kumar
 324972 miller

So, those are the top 10 lists. But I'll bet you want everything!

The Torrent

But it occurred to me that this is public information that Facebook puts out, I'm assuming for search engines or whatever, and that it wouldn't be right for me to keep it private. Why waste Facebook's bandwidth and make everybody scrape it, right?

So, I present you with: a torrent! If you haven't download it, download it now! And seed it for as long as you can.

This torrent contains:

  • The URL of every searchable Facebook user's profile
  • The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc)
  • Processed lists, including first names with count, last names with count, potential usernames with count, etc
  • The programs I used to generate everything

So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-)

Limitations

So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don't have those capabilities right now. I'd like to tackle that in the future, though, so if anybody has any bandwidth they'd like to donate, all I need is an ssh account and Nmap installed.

An additional limitation is that these are only users whose first characters are from the latin charset. I plan to add non-Latin names in future releases.

142 thoughts on “Return of the Facebook Snatchers

  1. Reply

    junofeeng

    With facebook's growth, more and more hackers focus on it.The security becomes increasingly important

  2. Reply

    Paul Evans

    Looks great as a dictionary for driving brute-force SSH/website attacks or similar. What's the betting that there's at least 10,000 users in that list whose password is some variation on their date of birth which, of course, they'll publish too?

  3. Reply

    floyd

    Nice. It's available via Google as well (even if Google doesn't like this query --> Captcha):

    inurl:"/directory/people" site:facebook.com

  4. Reply

    Karel Blinker

    Now what is left is generating passwords from the names (like your jsmith and smith) and trying these out to access the accounts. With that amount of data you will have many hits.

  5. Reply

    kyle

    Keep up the good work!

  6. Reply

    Richard Ferbe

    Good find! thanks for the share

  7. Reply

    Sean Sullivan

    Nice post. Thanks for the mention.

    171 million names (from the latin charset)... is that the A-Z but not the 1-26?

    I'm curious how often the index is updated. One of our researchers (who has had an account for a long time) isn't listed even though his options would allow it. And our CRO, Mikko Hypponen is in the index, but another Mikko Hyppönen, that can be found via Google, isn't. (And there are five M.H. overall when searching inside of Facebook.)

    In any case, if there are actually 500 million accounts, and only (only!) 171 plus million names in your torrent. Does this mean that more than half of Facebook accounts have taken the time to opt-out?

    Seems like a lot. Conventional wisdom holds that most people don't adjust their privacy settings. (But I never cared for that bit of CW.)

    Here's another fun index for you: http://www.facebook.com/family/

    1. Reply

      Matt Gardenghi

      Sean Sullivan: Iago probably borked his research.... ;-)

    2. Reply

      Ron Bowes Post author

      Hey Sean,

      Yes, I did A-Z but not 1-26. 1-26 offered some unique challenges, and I've started working on phase 2 where those will be harvested. I'm in Vegas for Blackhat/Defcon right now, though, when I get back I'm going to start working on 1-26 and everything else.

      It's really hard to answer your other questions. Facebook's 500 million claim may not be true, and their directory seems somewhat sketchy as to what they do/don't include. Additionally, I'm suspicious whether or not the directory updates as I'm going, because I might skip some/hit duplicates if it does.

      in any case, it's super cool. :D

      Now, I'm on a bus at the Hoover Dam right now, so I'd better sign off and enjoy my bus tour. See y'all soon!

  8. Reply

    MrMiGu

    Have you checked the phone book lately?

  9. Reply

    kats

    Seems like a very roundabout way to do this:

    for ((i = 0;; i++)); do curl "http://www.facebook.com/profile.php?id=$i" > $i.html; done

    You'll get everything that Facebook has visible publicly, friends and all. You can find Zuckerberg at i=4.

  10. Reply

    Liz

    The difference in user numbers could be because you used Romance/European language alphabet. Since most users of Facebook aren't in the U.S., you might need to try alternative alphabets (Cyrillic, Japanese, Greek, etc.) if Facebook allows for these alternatives.

  11. Reply

    wuntee

    Possibly another easier/faster way - enumerate all numbers:

    http://graph.facebook.com/4

  12. Reply

    winikeh

    I would love to see the ncrack script to go along with this.

  13. Reply

    mark

    Facebook bumped up their IDs to 100 trillion and their IDs include all objects in the "graph" now, so iterating by IDs could take a very long time.

  14. Reply

    YAY

    Great work.

    I don't know why people are talking about nrack though as you can't login with just the username/profile id.

  15. Reply

    Ulrich

    They claim 500 M users, but how
    come I can only count to some
    340 M?

    http://graph.facebook.com/340100101

    Can anybody find a higher userID?

  16. Reply

    Sean Sullivan

    Neil Rubenking at PCMag/Security Watch played around with the graph method. The small program that he wrote would have taken 18 years to collect all the info: http://blogs.pcmag.com/securitywatch/2010/05/facebook_id_hack_-_no_real_pro.php

  17. Reply

    Ben Further

    @Sean Sullivan

    Solution:
    -Define ID-Ranges (for 24h hours)
    -Find some "helper kittys"
    -And get crackin

    18Y -> ~6570D

    So u only need about 6570 ppl to get it done in a day :)

  18. Reply

    Gongo Bazook

    are pics also in the torrent?

    1. Reply

      Matt Gardenghi

      I haven't looked at the data myself, but from what Ron was saying, it is unlikely. It appears to just be data grepped from the results.

    2. Reply

      Ron Bowes Post author

      Hi Gongo,

      For Phase I, I haven't downloaded the pics.

  19. Reply

    Helios

    Send me an email, we'll talk bandwidth for you to use.

  20. Reply

    I'm on the list

    I'm downloading just to see if I am on the list. I had previously set my privacy settings on Facebook to be open to anyone that looked. This is a perfect way for me to see what is truly available to the public. Even though I am not listing my info in this post (I don't need the EXTRA attention) this will help me tremendously in my experiment. I must say thank you to SkullSecurity for putting this together.
    Cheers!
    (I plan on seeding this for several weeks)

  21. Reply

    Joe Crow

    Unfortunately for me, my position as the number 2 blogger, which dropped to 3 back then, has fallen off completely since those good ol' days!

    Great post.
    Joe Crow

  22. Reply

    Iam Furthest

    uhhmm, you'd need 6570 systems you mean.. not 6570 ppl?
    try 555-rentabotnet, create one yourself, or start a folding@home like screensaver :D
    this is just to superevilmindedwhoohootakeovertheworldanddontdoanythyhingwithitwhenyouredonecauseitwasonlyforthefun kinda stuff

  23. Reply

    quinametin

    Is it easy to adjust the script to collect pics also?

  24. Reply

    Anon

    Apparently the OP doesn't have any friends on Facebook..So he has to hack his way into finding people, what a sad little man.

    1. Reply

      Ron Bowes Post author

      Heh, I love the flamebait. Keep it coming!

  25. Reply

    g04t

    get a life dud. srsly.

  26. Reply

    Anonymous

    500 million user count claim is probably because of Google's top 1000 most visited sites...
    http://www.google.com/adplanner/static/top1000/

    Still I think you should not have released this. There are other way's to get people's attention, and this will only work misuse in the hand.

  27. Reply

    Demetry Gutierrez

    You should hack all your haters accounts. Good work by the way, brings to light the lack of security we call security today.

  28. Reply

    Simo

    for pics, you just do graph.facebook.com/zuck/picture?type=large ..and whoomp there it is!

  29. Reply

    anon

    Why didn't you include information other than the URL and name? Would be really useful to have the other details included. As is this is kind of worthless.

  30. Reply

    Nick Fisher

    I thought about doing a similar thing for Ebay accounts (using the feedback ratings), but then I said "What's the point ?"....

  31. Reply

    Phil

    So Demetry, people should have there accounts hacked if they disagree with this childish act, and they say people on the net aren't mature...

    Highlighting a flaw which really isn't a flaw, accessable data on a social networking site, hardly Watergate is it?

    Tune in next week when our increpid reporter discovers, some firms sell your email addresses on to third party after telling you they do...

  32. Reply

    yeeeah

    If only more people were seeding.

  33. Reply

    Anonymous

    FB may be poor with their security settings and it does need to be addressed. However, some of the details that you have made even more high profile in your release of them and the subsequent media hype may just have taken the work out of it for a non IT literate paedophile. I am sure you know that there are misguided children on FB too. Congratulations you got traffic to your site and kudos at the expense of the very audience you claim to be protecting, well done!!!!! A little more personal responsibility would have been wise perhaps.

    1. Reply

      Matt Gardenghi

      Anonymous: Seriously? Did you read the post? Look at the data? The fact is that this is a compilation of names. Not pictures, not email, not birthdays. These are just black and white text. You can go on FB and search for "jane doe" and come up with the same results + profile pics. That's more valuable to a pedophile than this list. Please try to read a "little" bit before posting. OK? Thanks.

  34. Reply

    Hoover

    But why did you do this then make it public?

    Whose side are you on?

    1. Reply

      Matt Gardenghi

      Hoover: That's a black/white fallacy. You are assuming that posting information makes one bad. This is no different then yellowpages.com posting your phone number. The information is public and in one place. Ron simply grabbed a chunk of it for data analysis. How is the analysis of public data bad?

      You frame this like it's a part of the vulnerability disclosure debate. Its not. This data was deliberately made public for the use of Google, Bing et al. (At least as I understand it.) So that being the case, why would Ron be taking sides (good or bad) by publishing the data that is published on search engines now?

  35. Reply

    me

    it wont download

  36. Reply

    Ponty.net

    Fascinating work. Proof above all proof that FB is nowehere near as secure as it needs to be. I disagree completely with their attempts to make information more 'searchable'.
    Personal data should not be harvestable.

  37. Reply

    Anon

    Bring on the Cyber War, the most deadly war yet to begin in modern age.

    They are not prepared.

  38. Reply

    fak3r

    I'm very interested in learning more about this, I'll try to grab the torrent tonight and bring it to DEFCON - will you have any spare time over the weekend? If you're going to work/talk about this at DEFCON, I'd like to be involved. I'm following you on twitter (@fak3r) now too, so pipe up if anything is going on. See you by the pool.

    1. Reply

      Matt Gardenghi

      fak3r: Ron will be at Fyodor's nmap talk on Friday in the front of the room. He said to look for him there. Or drop into #skullsecurity and chat directly.

  39. Reply

    David Curry

    Matt already addressed this but...

    Hoover: He's obviously on the information's side.

  40. Reply

    Tracksomebody

    http://tracksomebody.com/?cat=118

    paste any url of a facebook image
    and it'll return their name and url to their myspace

  41. Reply

    Tracksomebody

    their facebook*

    sorry was working on something else at the time

  42. Reply

    Hannah

    ... and?

    I could have told you there were lots of people called "John Smith" on facebook without any effort at all. I don't know what's more worrying; the time you've spent on this or the lunatics commenting on it who have completely failed to grasp the point. Matt Gardenghi and Phil excepted.

    1. Reply

      Matt Gardenghi

      Thanks Hannah. Nice to see another person grasp this situation for what it really is. To quote a comment I saw recently regarding the growing media circus: "It's sad that this is some of your least interesting work, and it's getting such attention." That commenter was correct. Ron's other work is far more interesting and useful: nmap scripts; conficker detection; energizer backdoor detection....

      Stick around Hannah, it's not usually this nuts here.

  43. Reply

    Seth Stahlman

    An observation about the ease of building names from Facebook public data seems rather timely, considering this NYT article:
    http://www.nytimes.com/2010/07/25/magazine/25privacy-t2.html?_r=3&pagewanted=1

    Would be interesting to run Ron's program in a year and do a comparison on the new data to the handy lists in the torrent, just to see what's stale and what's changed.

  44. Reply

    dude

    I uploaded the torrent file on rapidshare: http://rapidshare.com/files/409692690/fbdata.torrent.html

  45. Reply

    loveecho

    Great work!
    Thx for share!!!!
    God bless you.

  46. Reply

    Concerned

    The following comment has me worried, in regard to the pedophelia discussion here, privacy matters, right vs wrong etc.

    "Once I have the name and URL of a user, I can view, by default, their picture, friends, information about them, and some other details. If the user has set their privacy higher, at the very least I can view their name and picture. So, if any searchable user has friends that are non-searchable, those friends just opted into being searched, like it or not! Oops :) "

    Oops indeed. Thus there is access to pictures and details. A jackpot for many groups out there. Did you put your religion in the details or your political stance? You might be a target.

    This is why you dont put any personal info on the web and why keeping max security is always smart.

    Rons intention might be good, but he can serve as an inspiration to those who would want to do something similar for harmful intentions.

    Just some food for thought.

    1. Reply

      Ron Bowes Post author

      @Concerned:

      People with harmful intentions are no better off after releasing the data than they were before. The data has always been there, and for all we know they've been collecting it. Raising awareness like I did can only serve to help the problem.

  47. Reply

    Muni

    Hey Ron,

    Awesome work! Really interesting. My question is how often you index the data? Every day facebook user registrations are increasing exponentially. How are you going to keep your data updated?

    Muni.

    1. Reply

      Ron Bowes Post author

      @Muni: I have no idea, really. I'm hoping to do Phase 2, with even more users, in August. We'll see how much overlap there is!

  48. Reply

    Bon

    Can any one upload that torrent data to hotfile.com?

  49. Reply

    clyang

    Hi Ron,
    Thanks for your work! I already upload all bz2 files to hotfile. If people not able to download with BT. Please try the following links:
    http://hotfile.com/dl/58180613/b424710/facebook-f.last-withcount.txt.bz2.html
    http://hotfile.com/dl/58180632/e160598/facebook-first.l-withcount.txt.bz2.html
    http://hotfile.com/dl/58180642/9d6f4cc/facebook-firstnames-withcount.txt.bz2.html
    http://hotfile.com/dl/58180649/9b4144d/facebook-lastnames-withcount.txt.bz2.html
    http://hotfile.com/dl/58180822/5260779/facebook-names-original.txt.bz2.html
    http://hotfile.com/dl/58180985/41862da/facebook-names-unique.txt.bz2.html
    http://hotfile.com/dl/58181148/d2967b2/facebook-names-withcount.txt.bz2.html
    http://hotfile.com/dl/58181816/179517c/facebook-urls.txt.bz2.html

    Regards,

  50. Reply

    kho chi

    Cool Dude

    Good job, I would like to see too
    on the Apps side

    Once downloading user email
    and personal information, I was able to re-create and po-pulate
    the Open Graph data-structure that
    relates a person to their friends

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>