Return of the Facebook Snatchers

First and foremost: if you want to cut to the chase, just download the torrent. If you want the full story, please read on....

Background

Way back when I worked at Symantec, my friend Nick wrote a blog that caused a little bit of trouble for us: Attack of the Facebook Snatchers. I was blog editor at the time, and I went through the usual sign off process and, eventually, published it. Facebook was none too happy, but we fought for it and, in the end, we got to leave the blog up in its original form.

Why do I bring this up? Well last week @FSLabsAdvisor wrote an interesting Tweet: it turns out, by heading to https://www.facebook.com/directory, you can get a list of every searchable user on all of Facebook!

My first idea was simple: spider the lists, generate first-initial-last-name (and similar) lists, then hand them over to @Ithilgore to use in Nmap's awesome new bruteforce tool he's working on, Ncrack.

But as I thought more about it, and talked to other people, I realized that this is a scary privacy issue. I can find the name of pretty much every person on Facebook. Facebook helpfully informs you that "[a]nyone can opt out of appearing here by changing their Search privacy settings" -- but that doesn't help much anymore considering I already have them all (and you will too, when you download the torrent). Suckers!

Once I have the name and URL of a user, I can view, by default, their picture, friends, information about them, and some other details. If the user has set their privacy higher, at the very least I can view their name and picture. So, if any searchable user has friends that are non-searchable, those friends just opted into being searched, like it or not! Oops :)

The lists

Which brings me to the next topic: the list! I wrote a quick Ruby script (which has since become a more involved Nmap Script that I haven't used for harvesting yet) that I used to download the full directory. I should warn you that it isn't exactly the most user friendly interface -- I wrote it for myself, primarily, I'm only linking to it for reference. I don't really suggest you try to recreate my spidering. It's a waste of several hundred gigs of bandwidth.

The results were spectacular. 171 million names (100 million unique). My original plan was to use this list to generate a list of the top usernames (based on first initial last name):

 129369 jsmith
  79365 ssmith
  77713 skhan
  75561 msmith
  74575 skumar
  72467 csmith
  71791 asmith
  67786 jjohnson
  66693 dsmith
  66431 akhan

Or first name last initial:

 100225 johns
  97676 johnm
  97310 michaelm
  93386 michaels
  88978 davids
  85481 michaelb
  84824 davidm
  82677 davidb
  81500 johnb
  77800 michaelc

Or even the top usernames based on first name dot last name (sorry, I can't link this one due to bandwidth concerns; but it's included in the torrent):

  17204 john.smith
   7440 david.smith
   7200 michael.smith
   6784 chris.smith
   6371 mike.smith
   6149 arun.kumar
   5980 james.smith
   5939 amit.kumar
   5926 imran.khan
   5861 jason.smith

Or even the most common first or last names:

 977014 michael
 963693 john
 924816 david
 819879 chris
 640957 mike
 602088 james
 584438 mark
 515686 jason
 503658 robert
 484403 jessica

 913465 smith
 571819 johnson
 512312 jones
 503266 williams
 471390 brown
 386764 lee
 360010 khan
 355639 singh
 343220 kumar
 324972 miller

So, those are the top 10 lists. But I'll bet you want everything!

The Torrent

But it occurred to me that this is public information that Facebook puts out, I'm assuming for search engines or whatever, and that it wouldn't be right for me to keep it private. Why waste Facebook's bandwidth and make everybody scrape it, right?

So, I present you with: a torrent! If you haven't download it, download it now! And seed it for as long as you can.

This torrent contains:

  • The URL of every searchable Facebook user's profile
  • The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc)
  • Processed lists, including first names with count, last names with count, potential usernames with count, etc
  • The programs I used to generate everything

So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-)

Limitations

So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don't have those capabilities right now. I'd like to tackle that in the future, though, so if anybody has any bandwidth they'd like to donate, all I need is an ssh account and Nmap installed.

An additional limitation is that these are only users whose first characters are from the latin charset. I plan to add non-Latin names in future releases.

142 thoughts on “Return of the Facebook Snatchers

  1. Reply

    Vid

    Hi Ron,

    I am a young entrepreneur. I have an easier & legal way for you to get the name, first name, last name, gender & picture of all users. I would like to discuss this with you. Let me know if you are interested.

    Thanks

  2. Reply

    what we need

    We need people to do things like this. People do not realize what they are getting into when they open up a social networking site. I work in IT and fully suport this exercise. I would say that you should try to get as much data as you can without breaking the law. I am even willing to lend you my bandwith, I have 25/15 and would consider upgrading to 50/20.

  3. Reply

    Jeff

    I'm feeling a little dense here... What does this prove that we didn't already know before? If I want to find someone on facebook, I can do a search for them, find them, and view their public profile info. This is just a big list of people's public profile names is it not? Or am I missing something...

  4. Reply

    Thema3x

    @urlich, its simple, the id is a consecutive number, i have an ex-girlfriend who has a id greater than 690,000,000, thats why we need to count the id ex-members like me. :D
    @Ron I hope you can provide us the data dictionary of the files, it's such hard open this files and don't know how to read them

  5. Reply

    Frank

    It is worth remembering that many government already trawl social network sites, or fund such services and thence provide data collection services for a fee:

    Google, CIA Invest in ‘Future’ of Web Monitoring
    http://www.wired.com/dangerroom/2010/07/exclusive-google-cia/

    ‘Project Indect’: An A.I. to police all of Europe
    http://rawstory.com/08/news/2009/09/20/project-indect-an-ai-to-police-all-of-europe/

  6. Reply

    aiya

    I don't get it. What are you all on about ? It's already public information. The only point of interest is **if someone changes their privacy settings to full you still know "John Smith" has an account**,, so what? I bet there are thousands of "John Smith"s and how about George Bush, loads of them,, and Heidi Woodwind, I bet she exists too. Infact, I bet all name variations exist. So what you have is ID Name lookup,, And that gives you what ? You can search facebook for Heidi Woodwind now, and it shows 59 Results with pictures. It even gives you veriations (heidi woodland, heidi woodward).. Your text doesn't. Its also available in Google as shown... I still don't understand what a 1.4GB text file gives you that this doesn't. This IS NOT a security breach, you have not got information that people didn't allow / set in their profile. My settings are highest, I am NOT in the list. So.. proves this data is 100% useless. now scan in your phone book, and torrent that for everyone to wow at.. and don't forget the yellow pages,.. offer it to wikileaks, they will just reject it as already public information. USELESS

    1. Reply

      Matt Gardenghi

      aiya: The primary purpose was to collect combinations of "real first names and last names." The fact that they are from FB is incidental. This is all about determining the frequency of real names for purposes other than FB hacking.

      Frankly, FB hacking isn't that interesting unless you are A) trying to target a company through an employee on FB or B) you are trying to exploit FB users for profit/political points/etc.

      Point A is only useful for Pentetration Testing or Espionage. Point B is just illegal.

      Having a list of actual names makes brute force tools more useful. (There is the assumption that brute force tools will be used correctly within the bounds of the law by legitimate security researchers.)

  7. Reply

    Andrew

    But I shouldnt HAVE to protect my self, and secure everything. Only dull, boring, psychotic people with crap to hide secure every single little thing. Why can't I just enjoy my profile being out in the open. Why does a hacker (is this a black or a white hat site? I'd assume black but I can't tell) have to give away my name and url to people? It's all public information I agree, but giving it away especially since apple, at&t and TONS of other companies are downloading this, doesnt seem morally right. Keep this info for yourself, sharing it just so companies can bug the shit out of us is a bunch of crap. Good for you you are all about security, but everyones differnt, so please dont use your commie views as an excuse to post this torrent. I have mine unsecured so I can be me, and express myself. I should have to worry about a fellow hacker trying to make everyone in the world be exactly like him, probably fat as fuck and no vagina to finger. I don't want to be like you, I want to be like me, and only me.

  8. Reply

    Andrew

    meh and im not trying to bash you op personally, i just dont like it when people want everything to be the same about everybody. People are different for a reason.

  9. Reply

    bodmin

    Hi ,Ron.
    Nice work, I appreciate your programming skills very much, I like programming crawlers and spiders myself. But unfortunately your script has no practical value. Do you have any idea how to cut out mails and interests of users? :-) Regards, Sergey.

  10. Reply

    Andy

    Want to know if you were included in these files? This web page will tell you...
    http://nohasslesites.com/FacebookNames

  11. Reply

    yonose

    Hello there

    You did a really good job here!!!

    I hope Ncrack also serves you well too.

    Regards.

  12. Reply

    BuddyGusto.com

    The real open Facebook starts with BuddyGusto.com here people share there FB likes out of their own free will with people they do not know to get new FBfriends with the same likes....

    1. Reply

      Ron Bowes Post author

      To the couple people who are complaining about the data being useless -- I know, and I completely agree. The media definitely took it the wrong way. ohwell, it's been a fun ride :)

  13. Reply

    Madeline

    I have been trying to download via torrent.
    Seems people are not seeding. If somebody can mail me directly would be gr8.

    For a moment last week, i thought the big boys , read facebook and all are arm twisting and preventing sharing of information when this site was down last week.

  14. Reply

    Jason

    I have a VPS that has plenty of bandwidth I am not using. I could probably donate some of that bandwidth towards this. I would first have to check with my VPS provider to make sure using my VPS for "research" purposes is allowed. I only use about 3 to 5 gigs of my bandwidth per month on my VPS, and it has 450 Gb/mo. bandwidth. Swing by my website and let me know if interested (let me know via commenting on any of the posts, or in the forums).

  15. Reply

    Alice

    I did read somewhere once that it was against Facebook's TOS to use a parser to collect data from their site. I'm surprised they didn't actually do anything to prevent it. I mean they could easily have prevented by blocking the IP when it's just using too much bandwidth in too little time. I guess they don't really care.

    I was wondering how long it took you to parse the whole site. I guess a few months with a 100mbit line? Great work.

  16. Reply

    Anonymous

    Much ado about nothing. I certainly don't mind though. I've found a highly interesting blog to watch because of it.

  17. Reply

    Lina

    wow nice work!
    I'm wondering, how can I access the different events on facebook. most of them are public and I could really use their data.

    any help?

  18. Reply

    Ronald

    Yes this information is public and its our own fault, but YOU are the one who collated it to be misused by all the immoral people out there.

    Any consequences that come of this are YOUR fault, not facebook's and not even the users'.

    Someone else probably would have done it, but they didn't - you did. Congratulations, you are a dick.

    1. Reply

      Matt Gardenghi

      Ronald,

      Take a deep breath. This is public info. I'm sorry that you don't like people's names being collated; I assume you've never seen the phone book before. If some fool needs this collation of data before they can do bad, then they are dumb enough that they will get caught. Intelligent crooks (OK, most dumb crooks) won't need the leg up that this data doesn't actually provide. Those that need this data were also voted "most likely recipient of the Darwin Award" in their highschool year books.

  19. Reply

    Rick Smith

    @Ronald: I think you are making the assumption that Ron is the only person/group to have collected the FB information. The real problem is that the information is available to the world. How many others have gathered the same data (and more) and keep it to themselves?

  20. Reply

    David Curry

    Andrew: So... Why do you get to be yourself, but we can't be ourselves? Hypocritical much? Also, ad hominem, wonderful.

  21. Reply

    u1106

    Hmm, should I be glad or disappointed? I'm not on the list. (I have highly customized privacy settings, most fields in my profile are just empty, but I allow search engines)

    Actually none of the 10+ of my friends I tried now are on the list either. And I'm sure may of them have never touched their privacy settings.

    Even users that Google finds are not on the list.

    So for some reason the list is very much incomplete (these are all users with [a-z] only). So maybe the 500 million users isn't that incorrect after all.

  22. Reply

    violated

    well...you convinced me to delete my facebook account...

  23. Reply

    create free blogs

    Hey Ron,

    How was your security conference ? Your work is quite impressive. I am researching about Network forensics. what about user emails with the name list ? If not, can you let me know what additions can we make in the script to fetch emails as well.

    I am ready to contribute my bandwidth - got 50 meg per sec connection ;)..

    Regards,
    Nick

  24. Reply

    Julia

    I think this is a pretty sad reflection on what our world has become. Exactly why do you think it's ok to STEAL people's information? If we wanted something from you - we'd ask. You put up a "Spam protection" spot in the leave a reply section - and yet you are promoting spam . . . Hypocrite!

  25. Reply

    neofutur

    @julia : define "STEAL" ?
    how can you steal public information ?

    if I say "Julia posted a comment here" I stole you something too ?

  26. Reply

    muj

    Good that I don't have facebook

    Visit

    http://www.gbay.co.cc

  27. Reply

    4ud1t0r

    This sh1t cracks me up...

    1st - Ron :
    Interesting work. Keep it up. I say "interesting" because those of us that are in the game and who read this blog know that your other work is far more "useful"...

    2nd - Everyone else:
    Go to http://www.google.com
    Type in your name.
    Anything come up?

    THIS IS THE SAME THING!!!!

    Read my lips: PUBLIC INFO IS PUBLIC!!!! Get a life!

    If nothing comes up (which I suspect will be the case with most of the posters here)... then really GET A LIFE!

  28. Reply

    harrel

    What... the, nobody is safe on facebook

  29. Reply

    Jancis

    wow this is so interesting. a list of names you got. nice. .. yawn. can you get more? what are you all so happy about this, this is boring. man you should get a life.

  30. Reply

    carlo r

    hey
    do you have anything that will work for Linkedin in the same manner???

    1. Reply

      Ron Bowes Post author

      Not yet, but if somebody else wants to do it I'll happily post it + give them free credit. I don't really have the time right now.

  31. Reply

    r3dfish

    Hey guys,
    We gave a speech at DefCon 17 where we analyzed facebook scrapes using Hadoop. In this video we talk about scraping time stamps from peoples walls to map out the micro marketing implications of facebook usage. Check it out here:
    http://www.hackedexistence.com/project-facebook.html
    The complete speech can be seen here:
    http://www.hackedexistence.com/project-hadoop.html

  32. Reply

    EnglishStan

    What database will these files open with?

    Notepad crashes!!

  33. Reply

    PES

    excuse me, can anyone help me to open the files. i dunno the way to open the database, can someone help me to open the database? or telling the way? thx.

  34. Reply

    Jonathan Sieling

    I have all the bandwidth you need at 100mbs. I would $love$ to have the email address, employer, title, location, filled in.

    As for your next task. Put an entire user of your choice in SQL for their entire FB history. I bet you would be able to tell how often the go to specific places, or regular ruitines. I really want this for myself to see what times i usually update my status and how many bars i frequent each weekend.

    Bottom line, capture ANY AND ALL data you can, and somebody will find a use for it.

  35. Reply

    black shadow

    What about finding a name if you just have their picture, like "tineye" is it possible?

    1. Reply

      Ron Bowes Post author

      Interesting idea, but I would have to harvest the pictures first.

  36. Reply

    Magestik

    black shadow> I'm already working on this. I'm working with a guy who know many thing about imaging (including facial recognition). I'm going to make a big database with a little script which crawl graph.facebook.com (JSON and images). I'm going to need some help ...

    Wikipedia says : "U.S. Department of State operates one of the largest face recognition systems in the world with over 75 million photographs that is actively used for visa processing."
    We can beat them ^^

  37. Reply

    totzpalanz

    we're you able to download the emails for this 1 million FB users?

  38. Reply

    chandan

    wow this is ridiculous.

    FB should be facing problem now

  39. Reply

    quinametin

    Magestik> I did the same :) I've tried to use OpenCV but the result is not very good... the error rate is very high. I've tried with 900.000 pics.

  40. Reply

    tomas sanchez

    People are starting to find use of it. Some guy made a username dictionary out of it.

    http://www.4shared.com/file/X-gu_-UQ/facebookusernamestxt.html
    http://www.mediafire.com/?38936aa9d3jkeva
    http://rapidshare.com/files/412403215/facebook.usernames.txt.zip
    http://www.megaupload.com/?d=7LNBFEDE

  41. Reply

    Magestik

    quinametin> I'm training with a few pictures (40) and eigenfaces algorithm... With my recent improvement the software can check 1 picture in 0.01s ... And the results are always OK !

    The problem is 0.01s per image is too slow because it would take 22 days for 100.000.000 images ... So I have to continue to increase speed, then I'll test it on more images.

  42. Reply

    quinametin

    @Magestick Maybe we can exchange some experiences :)

  43. Reply

    SundayDriver

    Well I have been using the URL table to point my own crawler to capture email, interest, etc. However, for the account i used on the crawler, email addresses are now being displayed as an image. If i use my personal account, emails are back to being text. I need help!

  44. Reply

    Asim Zeeshan

    I downloaded this torrent and uploaded it to my VPS. Anyone who wishes to download this data via http can do so from here.

    http://ash.li-node.com/fb_torrent.tar

    more locations on my blog
    http://www.asim.pk/2010/08/20/download-personal-details-of-100m-facebook-users/

  45. Reply

    apodfuid

    Wow. Anyone who doesnt know how this is useful is missing the point lol. ne thing though. you should really consider putting up a software suite that adds common password text. such as one that inner caps, does backwards, adds ! at front and end,does the pw backwards etc. Your work in the password cracking field is very impressive, and tools such as these would be very helpful, especially if put in a suite.

    1. Reply

      Ron Bowes Post author

      @apodfuid - John the Ripper already does a lot of that, so when I need to crack passwords (or generate bigger lists) that's what I do.

  46. Reply

    shon

    Does anybody know how can i retrieve in an automatic way (without being logged in to facebook) list of friends for a specific ID ?
    thanks,
    Shon

  47. Reply

    bach6

    Other Downloads:
    Fileserve:
    http://fileserve.com/list/nGX8Bk3

    Filefactory:
    http://www.filefactory.com/f/ce9249c8422b750c/

    Depositfiles
    http://depositfiles.com/folders/U3NQFBQ25

    Keepfile
    http://www.keepfile.com/users/sleep9999/1302/fbprofiles

  48. Reply

    SIFE

    does it content their emails .

  49. Reply

    Captain Sarcastic

    Even more useful...

    By looking in a phone book you can not only get all of the people's names, but also their phone numbers and addresses. Creepy huh?

    There are definite privacy issues and it looks like the phone company will have to tighten security.

  50. Reply

    jake smith

    Ron i have to say you've out done your self nice work and theirs no damn way this can be accomplished with win 7. which linux distro did you use bro? and im excited to give ncrack a try when it hits the market.
    cant wait. will seed the torrent for a while thx.

    1. Reply

      Ron Bowes Post author

      Hey Jake,

      I used to use Slackware, lately I've been into Gentoo.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>