Deanonymizing Facebook Users By CSP Bruteforcing

Deanonymisierung

Did you ever wish to have all relevant information about a visitor right when he hits your site? Think of (full) name, gender and maybe hobbies and interests? Thanks to social networks we could at least get some of that data. All you need is the URL to that visitors (public) Facebook or Google+ profile – but if he doesn’t actively give it to you, you’re probably out of luck.

What if we could get that profile URL without the user even noticing it?

Inspired by the great article When Security Generates Insecurity (hat tip to Michael, who shared it and implemented the proposed login-check) I discovered that it is possible to get the profile URL of a logged-in Facebook user by exploiting the Content-Security-Policy implementation in Google Chrome. It requires some preconditions, but it’s definitly possible – and I’m going to explain how it works and why it’s dangerous in really great detail 😉


In the following video I demonstrate a proof of concept (click here to try it yourself)

Note: This article is also available in German, see
Deanonymizing Facebook Users By CSP Bruteforcing

Table of Contents

A Practical Attack to De-Anonymize Social Network Users shares a lot of valuable information. The paper describes a combination of History Stealing and fingerprinting (based on group memberships) that was used to identify a user’s Xing profile. Particularly interesting are the techniques to extract profile information from social networks (that are still working). Since 2010 it is „unfortunately“ no longer possible to use the mentioned exploit to optain the browsing history (see Preventing attacks on a user’s history through CSS :visited selectors) – at least as long as you don’t get the user to actively interact with your page. In that case History theft with CSS Boolean algebra might be possible (though it certainly won’t scale well enough for a real threat).

A great source for inspiration is the I know … series by Jeremiah Grossman. Some articles worth mentioning are

I Know Your Name, and Probably a Whole Lot More
A user clicks on a transparent like button for a fanpage and triggers a script that immediately gets the newest ‚fan‘ of that page.
I Know Who You Work For
Try to load (company internal) intranet URLs to check if they resolve or deliver an error.
Breaking Browsers: Hacking Auto-Complete
Simluation of keyboard events to read information from auto-complete fields.

My Appoach: Bruteforcing of URLs by Content-Security-Policy Restrictions

My approach makes use of the brower’s behaviour when it has to load a URL that ist blocked by a CSP restriction. CSP is the appreviation for Content Security Policy and is used to provide a whitelist of trustworthy sources for external resources like images, JS- and CSS files, etc. in order to deny the loading of malicious code. See An Introduction to Content Security Policy for a great introduction into the matter.

General functioning principle of the Content-Security-Policy directive
General functioning principle of the Content-Security-Policy directive

How does it look in the code?

So far so good – but CSP also reveals whether a certain URL can be successfully requested – which becomes especially interesting when that URL gives us some clues about the users identity.

Deanonymizing by checking profile URLs

Let’s assume we find a URL („Original-URL“) that perfoms a redirect based on private information (e.g. login status), then we could verify the target url („Target-URL“) of said redirect. We just need to allow to load resources from Original-URL as well as Target-URL by setting an appropriate CSP header.

Note: The redirect must be implemented via Location-header. JavaScript or meta-refresh requires the browser to render the full page before redirecting and we have no way of injecting the CSP header in a page loaded from a remote source.

Examples of URLs that redirect based on login-status:

  • https://www.facebook.com/me redirects to the Facebook profile
  • https://plus.google.com/me redirects to the Google+ profile
  • https://www.youtube.com/user/ redirects to the Youtube user channel
  • https://www.xing.com/profile redirects to the Xing profile

If a user is not logged in, the redirection targets are

  • https://www.facebook.com/login.php
  • https://accounts.google.com/ServiceLogin (Google+ and Youtube)
  • https://login.xing.com/continue

Checking login-status

Let’s take a look at the following example to get a grasp how to use that technique for deanonymization:

Checking login status by CSP whitelisting
Checking login status by CSP whitelisting
  1. Allow access to https://www.facebook.com/me and https://www.facebook.com/login.php via CSP header
  2. Create <script> element with src=https://www.facebook.com/me
    1. If I’m logged into Facebook as Pascal Landau, https://www.facebook.com/me will redirect to https://www.facebook.com/pascal.landau1 and the browser would report a failed request.
    2. If I’m not logged into Facebook, https://www.facebook.com/me will redirect to https://www.facebook.com/login.php?… and the browser would report a successful request.

By now I know at least if I’m logged into Facebook (a) or not (b). What I do not know is my profile URL (I’m only able to tell if the request for https://www.facebook.com/login.php was successful).

Checking the profile URL

Only a small modification is required in order to obtain the profile information:

Deanonymizing the Facebook profile URL
Deanonymizing the Facebook profile URL
  1. Allow access to https://www.facebook.com/me and https://www.facebook.com/pascal.landau1 via CSP header
  2. Create <script> element with src=https://www.facebook.com/me
    1. If I’m logged into Facebook as Pascal Landau, https://www.facebook.com/me will redirect to https://www.facebook.com/pascal.landau1 and the browser would report a successful request.
    2. If I’m logged into Facebook as someone else, https://www.facebook.com/me will redirect to https://www.facebook.com/pascal.landau1 and the browser would report a failed request.
    3. If I’m not logged into Facebook, https://www.facebook.com/me will redirect to https://www.facebook.com/login.php?… and the browser would report a failed request.

So far so good – there’s just one big fat disadvantage: I need to know the exact URLs for the CSP whitelisting prior to the check. In consequence, I’d have to check every single profile URL in order to identify a random user – that is 1 HTTP request per check. Thinking of Facebook’s 1 billion+ users, it becomes obvious that a simple bruteforce makes no sense…

Binary search for the right profile

Fortunately, we can check more than one URL at once. If the subsequent checks are crafted carefully, I can quickly reduce a whole bunch of URLs to just one – effectively minimizing the required number of HTTP requests. Consider the following example:

Binary search on a small set of profiles
Binary search on a small set of profiles

Let’s assume we want to check 6 different profile URLs. Further, I’m logged into Facebook as Pascal Landau and my profile is located at https://www.facebook.com/pascal.landau1

  1. Allow access to https://www.facebook.com/me and the following 6 profile URLs via CSP header
    (1) https://www.facebook.com/JohnDoe1
    (2) https://www.facebook.com/JohnDoe2
    (3) https://www.facebook.com/pascal.landau1
    (4) https://www.facebook.com/JohnDoe3
    (5) https://www.facebook.com/JohnDoe4
    (6) https://www.facebook.com/JohnDoe5
  2. Create <script>-Element with src=https://www.facebook.com/me
  3. https://www.facebook.com/me redirects to https://www.facebook.com/pascal.landau1 and the browser reports a successful request.

Since the request suceeded, one of the provided URLs in the CSP header has to be correct. Therefore I’ll cut the 6 profiles in half and get 2 groups: 0 [(1),(2),(3)] and 1 [(4),(5),(6)]. The process continues:

  1. Allow access to https://www.facebook.com/me and the following 3 profile URLs via CSP header
    (1) https://www.facebook.com/JohnDoe1
    (2) https://www.facebook.com/JohnDoe2
    (3) https://www.facebook.com/pascal.landau1
  2. Create <script>-Element with src=https://www.facebook.com/me
  3. https://www.facebook.com/me redirects to https://www.facebook.com/pascal.landau1 and the browser reports a successful request.

Since the request suceeded again, one of the URLs in group 0 [(1),(2),(3)] has to be correct. Therefore I cut the 3 profiles again in half and get 2 groups: 00 [(1),(2)] and 01 [(3)]. The process continues:

  1. Allow access to https://www.facebook.com/me and the following 2 profile URLs via CSP header
    (1) https://www.facebook.com/JohnDoe1
    (2) https://www.facebook.com/JohnDoe2
  2. Create <script>-Element with src=https://www.facebook.com/me
  3. https://www.facebook.com/me redirects to https://www.facebook.com/pascal.landau1, which is no longer present in the CSP header. Consequence: The browser reports a failed request.

Since the request failed this time, all URLs in group 00 [(1),(2)] are not correct. Therefore, the correct one hast to be in 01 [(3)] – so let’s check that:

  1. Allow access to https://www.facebook.com/me and the following profile URL via CSP header
    (3) https://www.facebook.com/pascal.landau1
  2. Create <script>-Element with src=https://www.facebook.com/me
  3. https://www.facebook.com/me redirects to https://www.facebook.com/pascal.landau1 and the browser reports a successful request.

Congratulation: I successfully identified „myself“ from a group of 6 different profiles. I didn’t even have to make 6 HTTP requests but only 4. That doesn’t sound like much, but the reduction is actually calculated by:

In other words: The number of required requests corresponds to the binary logarithm over the sum of all profiles – so we actually implemented a binary search. O(log(n)) ftw 😉

The following table illustrates the impact of that approach:

n (sum of checked profiles) i (required HTTP requests)
1 1
10 5
100 8
1.000 11
10.000 15
100.000 18
1.000.000 21
1.000.000.000 31

That looks much better: We can easily check 1 billion URLs in about 30 HTTP requests!

The elephant in the room: High data volume

Unfortunately, this approach still has a drawback :/ Profile URLs can only be checked in the browser of the client, so we need to transfer them to him.Let’s assume we wanted to check 1 billion profile URLs for a random User. For the sake of simplicity, every URL consists of 35 characters and looks like https://www.facebook.com/0000000001, https://www.facebook.com/0000000002, etc. Each character takes up 1 Byte of memory so we end up with 1 x 35 x 1.000.000.000 Byte = ~32,6 GB! Even when we strip everything but the path (e.g. 0000000001 instead of https://www.facebook.com/0000000001) we’re still left with 1 x 10 x 1.000.000.000 Byte = ~9,3 GB. That’s a freaking lot of data to transmit!

Fortunately this data can be further compressed – by gzip for instance. Modern webservers can do that out of the box for every text based response (e.g. via mod_deflate for Apache). I wrote a small simulation via PHP:

ID count ~Data size ununcompressed ~data size compressed
1.000 9,7 KB 2 KB
10.000 97 KB 21 KB
100.000 977 KB 218 KB
1.000.000 9,54 MB 2,1 MB

The compression factor seems to be around 5:1 – but we used rather compression friendly data (only 10 different characters), expect this to be much worse in reality. Even in this very optimistic scenario we’re at a huge 2 MB of data for one request containing 1 million profiles. Inevitably, that leads to long transmission times and very much traffic.

We’re really hitting a wall here! In my opinion, the only viable approach is a prequalification of profiles that need to be checked so that we keep the total number low.

Prequalification of Facebook profile URLs

The solution to the aformentioned problem is really the icing on the cake to this exploit. I didn’t come up with the one solution so far, but still want to present some promising approaches:

Location information

First of all, every information we obtain about the user upon his first request should be evaluated as a possible means of prequalification. There are two things that immediately come to mind: IP address and browser language.

Resolving IP address to location
Resolving IP address to location

The IP address can be used to identify the location of our user (using the [public and free] API of IPInfoDB, for instance) and try to match that to the hometown of a facebook user  (see next chapter for a way to get the hometown).

Extraction browser language from "Accept-Language" header
Extraction browser language from „Accept-Language“ header

If the browser includes an Accept-Language HTTP header in its request, we might use that to match the locale-setting of a profile (again, see next chapter).

Applying group fingerprints

The paper A Practical Attack to De-Anonymize Social Network Users mentioned in Related Work also provides an interesting approach to prequalify a set of users: The browsing history is checked for visited (Xing-)groups so that the set of possible profiles is restricted to the members of those groups.

The problem of checking all profiles is thereby effectively reduced to the checking of all groups – which is a much lower number. This technique can be further improved (avoid really large groups, identify overlapping members and remove redundant groups, etc.) and possibly also used on Facebook. Again, history stealing does no longer work in modern browsers (so we cannot just copy that approach) and couldn’t find a CSP based vulnerability (yet).

The best I came up with is the URL https://www.facebook.com/ajax/groups/members/add_get.php?group_id=[groupId]&amp;email=1 that will deliver a status code 200 to group members, while it returns a status code 404 to non-members. This information can be used be optained in a similar but not as scalable way compared to the CSP exploit. Essentially we’d create a <script> element and set its source to https://www.facebook.com/ajax/groups/members/add_get.php?group_id=[groupId]&amp;email=1 to check if it’ll load sucessfully. Unfortunately we need one request per group – which severly limits the threat of that technique…

Predefined sets of users

Let’s assume you’re already interested in a certain group of people who you identified on Facebook and whose profile URLs you extracted (I’m pretty sure the NSA has such a list.. ;)). Using my approach it is possible to identify a user of that set if he visits an „infected“ page. Once indentified and properly tracked he can easily be recognized on every page that you control.

What’s that? That’s not a real world scenario because he won’t ever visit an „infected“ page in the first place?

Well why don’t you just target him with Facebook Ads based on his Facebook User ID?

Facebooks Custom Audience Targeting
Facebooks Custom Audience Targeting

Black hat marketers use that targeting technique already to get ultra high targeting at very low CPCs…

Data: Getting Facebook profile URLs and additional data

So far, I silently presumed to have the profile information of every facebook user readily available. Of course that’s not realistic: Firstly, Facebook has no interest to give us detailed information on everyone. Secondly the crawling of 1 billion+ profiles is not that easy – and we can’t just ask Facebook for an export of their database 😉

That leaves us with two possible options: Using the Facebook Graph-API and screen-scraping Facebook’s webpage.

One word of warning: Scraping on a large scale is legally not permitted!

Facebook Graph API

Have a look at the User table. We can access a subset of those information via http://graph.facebook.com/[User ID] (facebook uid) or http://graph.facebook.com/[Username].

Example response for the request https://graph.facebook.com/1144346144 resp. http://graph.facebook.com/pascal.landau1:

Note: The „locale“- field allows us to perform the aforementioned matching on browser language.

But it get’s better: By using FQL queries it’s possible to perform bulk-request – just use an IN statement with a list of user IDs or usernames.

Example response for the request https://graph.facebook.com/fql?q=select first_name,last_name,uid,profile_url,sex,locale,username+from+user+where+uid+in+(4,1144346144) resp. https://graph.facebook.com/fql?q=select first_name,last_name,uid,profile_url,sex,locale,username+from+user+where+username+in+(„zuck“, „pascal.landau1“)

The maximum number of returned profiles seems to be 5000 per request. In theory , „only“ about 1.000.000.000/5000 = 200.000 requests are required to get all of Facebooks profile urls. Still a lot but certainly doable and much better than 1 billion requests.

The downside: We need the exact user ID resp. username to get a result. Unfortunately, user IDs are not in sequential order so we can’t simply „increment“. A little googling revealed the Quora thread Facebook Company History: What is the history of Facebook’s user ID numbering system? Short summary:

In the beginning, facebook had numerical blocks for each college with up to 100.000 possible user IDs. A user ID is created by combining a college prefix and a sequential ID. Columbia, for instance, has the prefix 1 and its first user on Facebook was Sasha Katsnelson having Facebook uid 100005. An overview of all known prefixes can be found at Facebook Company History: In what order did Facebook open to college and university campuses?. Unfortunately, this system was changed in the meantime, according to a Facebook spokesman:

We assigned numerical blocks in the early days, but today user IDs are not issued sequentially. We draw them from a variety of number ranges.

Maybe it’s possible to reengineer the current system, but I didnt really look into that, because…

Screen-Scraping Facebook’s webpage

… Facebook provides us with a nice list of all publicly available profiles in its people directory! Now we just need to drill down to the lowest level (e.g. https://www.facebook.com/directory/people/A-1-120) and scrape all profile URLs. We could even use those to fire bulk requests against the API in order to enrich the data.

Another starting point is facebooks people search. We need to be logged in to use it, but we’re rewarded with the possibility to use filters to restrict the search results.The most useful filter is probably the hometown – combine that with the city obtain from the IP address lookup and we should be able to narrow down the possible profiles tremendously. The full process would look something like this:

  1. Get a list of most frequent names (see here or here)
  2. Get a list of cities (see here)
  3. Search for the first name in the list
  4. Set hometown filter to first city in the list
  5. Scrape the results
  6. Repeat with next city (GOTO 4)
  7. Repeat with next name (GOTO 3)

Sounds easy in theory but becomes somewhat complicated because we cannot use filters in the search function of the API – which leaves us with the heavily AJAXified user interface. More complicated but not impossible – CasperJS can do that for us, although it’ll take some time 🙂

General problem: Inconsistent browser implementation of the Content-Security-Policy header

The deanonymization technique presented above will only work when the browser matches the whitelist of sources in the CSP header on a path level. That does not comply with the first CSP specification of the W3C, that according to 3.2.2 Source List only requires a validation on host level. That’d make it impossible to check for a full Facebook profile URL – so we cannot check for https://www.facebook.com/pascal.landau1 but only https://www.facebook.com/. Of course that is not sufficient for a deanonymization.

Luckily, CSP 1.1 (still a draft) introduces the validation on path level. Abusing this feature for privacy breaches was already discovered in January 2014 by Egor Homakov, who blogged about it in Using Content-Security-Policy for Evil. That started a discussion around a possible threat, resulting in a (provisional) special treatment in the current working draft of the CSP 1.1 specification (see 4.2.2.3 Paths and Redirects). This adjustment restricts the validation after a redirect occured to the host level, effectively removing the possibility to recognize full Facebook profile URLs.

Wait.. so all that stuff doesn’t actually work?

Nope, I did not say that 😉 The current CSP 1.1 spec exists since 2. Juli 2014 – the previous version of the CSP 1.1 draft  does not include the „Paths and Redirects“ part – so the exploit is possible. Said previous version is currently implemented in Google Chrome (tested in Chrome 36) – there is even a bug report from October 2013 explaining exactly the approach I’m taking.

What about other browsers?

Browser implementing CSP 1.0 are not vulnerable to this deanonymization technique. Neither are those implementing the current draft of CSP 1.1. I checked Internet Explorer 11 (seemed to ignore CSP completely…), Firefox (FF30.0, implements CSP 1.0) and Google Chrome (36, is vulnerable – probaly since about version 25). The Can I use… overview for the Content-Security-Policy does not include the implemented CSP version and cannot be used to verify exploitability.

Will it work in the future?

Well, probably not. Currently only Chrome is vulnerable and there is already an acknowledged bug report. Any other browser should implement the most current version of CSP 1.1 and won’t be vulnerable.

Any other restrictions?

Third-Party-Cookies have to be allowed – otherwise it’s not possible to check for a Facebook login. But if that’s deactivated, (legitimate) like buttons for webpages wouldn’t work either.

Seems like a rather small use case, why did you put in all that effort?

To be blunt, I didn’t know there were so many preconditions. I discovered that path-based validation is possible in Chrome (try & error). Building a complete deanonymization POC just kind of developed from there. Plus, I did not know there already was a bug report for Chrome and a „patched“ version of CSP 1.1 – that’s something I firstly discovered when I started this article 🙁

TL;DR: CSP-Deanonymization works – but not for everyone

The method employed here is only working in Google Chrome because Chrome implements an old version of the CSP 1.1 Spec.

Further it is only practically relevant if a subset of all Facbook profile URLs can be predefined – otherwise the data size ist too large and the attack takes too long to execute. In a test of 100.000 profiles it took about 25 seconds and 2 MB of traffic to successfully deanonymize myself. Although it should be possible to trade time for traffic to some degree if multiple requests are performed in parallel.

Proof of Concept – A Tool for Deanonymization

The theory behind this attack has been explained in great detail – now it’s time to it get to work. I developed a small web application that serves as a Proof of Concept for Deanonymization.

How to use the tool?

In the first step we need to make sure that all preconditions for the POC to work are met:

  • You’re using Google Chrome (Version 25 to 36 [current version])
  • You don’t restrict Third-Party-Cookies
  • You’re logged into Facebook

Prequalification of profiles

The second part continues with an explanation of the functionality. I’d like to extend that explanation a little further. First of all, I did not scrape & use real Facebook user profiles out of fear of legal consequences. So you cannot just „hit a button“ and deanonymize yourself.

To be still be able to prove my method, I’ll instead provide a text field, in that the correct profile has to be entered. Feel free to add some other profiles too, since we’re essentially building our own predefined set of Facebook profiles to check.

Enter profiles to be checked
Enter profiles to be checked

The tool will use the input of the text field to start a CSP bruteforcing attack. In a real world scenario we would of course do that automatically by prequalifying the Facebook profile URLs. I’m aware that this setup is not optimal :/ But I still hope it is a goof trade off between „The tool is pratically functioning“ and „I don’t want to get sued by Facebook for illegaly mass-scraping of profiles“ 🙂

Feel free to propose a better solution!

Functioning principle

After providing the data basis we can now move on and start the deanonymization:

Screenshot of the deanonymization tool
Screenshot of the deanonymization tool

On a technical level I used the dynamic insertion of iframes to implement continual CSP tests. The CSP directive is provided via <meta> element within the iframes <head> section and a <script> element is created in the body to check for https://www.facebook.com/me. The result of that <script> request is pushed back to to parent window via postMessage:

The full process has essentially 6 steps:

1. Step – Check for browser vulnerability

See if CSP is implemented and if it’s CSP 1.0 (host level) or CSP 1.1 (path level).

2. Step – Check if the user is logged in

That’s basically the implementation of checking the login-status.

3. Step – Simulate a ‚real‘ scenario

In this step, 1.000 fake profiles are transmitted to the browser to estimate the traffic and time it would take in a real attack. Of course, this check will always fail, but we’d only need to replace the fake profiles with real profiles and had a ready-to-go attack. [This step is not necessary for the POC to work. It used to be 100.000 fake profiles but that seemed to break the mobile version of Google Chrome]

4. Step – Check the profiles in the text field

We’re using the input of the text field to check for the correct profile URL.

5. Step – Binary reduction

See Binary search for the right profile. The following PHP function is used:

6. Step – Display results

In the last step the identified profile is shown to the user.

Once again the link to the tool.

Scope, How-to-fix and Responsible Disclosure

I discovered that it takes only a minimum amount of adjustment to make this POC work not only for users of Facebook, but also Xing (fixed on 05.08.2014), Google Plus and Youtube . In fact, every website is exploitable if it has

  • a) profile URLs on a unique path level
  • b) a URL that redirects to that profile url via Location header.

I couldn’t find those for LinkedIn (profile URLs are identified by parameters instead of path) or Twitter (no redirect to profile URL found).

As a network, there’s little that can be done to prevent this attack. As long as there’s as URL that redirects to the profile of a logged in user, that user can be deanonymized by CSP bruteforcing.

Quickfixes:

  • Replace Location-based redirects by JavaScript or meta-refresh
  • Make it as hard as possible to obtain detailed information of the user’s profiles (to prevent prequalification of profile URLs)

Responsible Disclosure

The exploit currently only affects Google Chrome and will most probably be fixed in the future. But informing at least the exploitable social networks seems to be the right thing to do – especially since I’m going to publicly disclose the attack in detail. So I contacted Facebook, Chrome, Google Plus, Youtube and Xing upfront and leave a copy of the communication in this article:

Facebook

Known, won’t fix – see conversation

Google

Known, won’t fix – see conversation

Xing

Fixed – see conversation

Bottom Line: Learned a lot, Archieved a little

What’s left after 4500 words, several hundred lines of code and many hours of try & error?

For me, it was really exciting to (almost) fully document an exploit and provide a working POC. A small downer is the fact, that the problem is not only already known but also fixed in the current version of CSP 1.1 spec. I still think it’s a practially relevant use case (after all, Chrome has been and is vulnerable to this) and hope at least to draw some attention to the exploit. Apart from that, the „spec“ is one thing and the „browsers implementation“ another 😉

Any notes, questions and improvements are greatly welcome – as well as a tweet or share on a social network of your choosing 🙂

Hirnhamster

hat einen Bachelor in Angewandter Informatik und bloggt auf MySEOSolution regelmäßig zu Updates im Bereich der Suchmaschinenoptimierung. Außerdem freut er sich über Kontakte auf Google+ 🙂

More Posts - Website - Twitter - Facebook - Google Plus