Ghostery lists Adobe TypeKit as privacy threat

The Internet tracker blocking program Ghostery now lists Adobe TypeKit (a very popular font service) as a privacy threat. I read about this first on WUWT:

I’ve gotten a few complaints this week from some overly paranoid people that say they can’t see WUWT anymore in Firefox, but can in Safari. The problem seems to be related solely to a browser extension called “ghostery” which is somehow flagging Adobe Typekit (used to provide custom fonts on WordPress) as some sort of malware.

Ghostery is not malware blocking software (as you can read on wikipedia). It is software that protects you against tracking while surfing the web and IMHO you are not overly paranoid when you use it. In the comments somebody explains:

Font are very seductive tracking beacons. Honest people who would never consider installing a tracking beacon have no qualms about using served fonts, and there’s no difference between them. There is a lot of ignorance out there regarding data mining.

So maybe Ghostery is not listing Adobe TypeKit by accident? We see with Google Analytics that website owners are happy to pay for analytics with their visitors privacy. The same may apply to fonts (although TypeKit is not free). But before we accuse Adobe, let’s take a look at the Adobe TypeKit privacy policy:

In order to provide the Typekit service, Adobe may collect information about the fonts being served to your website. The information is used for the purposes of billing and compliance, and may include the following: …

So, one thing is for sure: Adobe TypeKit is in fact collecting data while serving fonts. This alone may be reason for Ghostery to block it. I did some research and verified that next to the font files TypeKit is loading a 1 by 1 pixel GIF image that has an URL like this:

http://p.typekit.net/p.gif?s=1&k=sgt5tia&app=&ht=tk&h=wattsupwiththat.com&f=...

In the privacy statement Adobe says they collect data “for the purposes of billing and compliance”, which seems reasonable. Also, the privacy policy has a list of data that they collect. None of the data on the list seems to be invading the privacy of the website visitor. So is this a big fuss about nothing? I’m not sure. If you pay close attention to the wording of the sentence you see that they chose to use “may include”. AFAIK “may include” does not imply “is limited to”. Also this “compliance” is not further specified. What do they need to comply with?

Can Adobe TypeKit be trusted to respect our visitors privacy? Probably they can, but even after reading their privacy policy I’m not 100% sure. What do you think? Should I take off my tin-foil hat?

Share

Be a pro: use font embedding, not font linking

If you want to use a font on your website you can load it by linking to an external server (using CSS or JavaScript). This is common practice and you will probably know about it if you worked with Google Fonts or Adobe Typekit. This is what we call “font linking”. The alternative is that you host the font yourself and use @font-face in your CSS to load it. You will need to upload the font in several formats to your server. This self-hosted approach is also called “font embedding”.

What is the difference?

With font linking you add the following HTML code to your website:

<link href='http://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>

While with font embedding you add the following CSS code:

@font-face {
  font-family: 'Open Sans';
  font-style: normal;
  font-weight: 400;
  src: url('open-sans-latin-regular.eot'); /* IE9 Compat Modes */
  src: local('Open Sans'), local('OpenSans'),
       url('open-sans-latin-regular.eot?#iefix') format('embedded-opentype'), /* IE6-IE8 */
       url('open-sans-latin-regular.woff2') format('woff2'), /* Super Modern Browsers */
       url('open-sans-latin-regular.woff') format('woff'), /* Modern Browsers */
       url('open-sans-latin-regular.ttf') format('truetype'), /* Safari, Android, iOS */
       url('open-sans-latin-regular.svg#OpenSans') format('svg'); /* Legacy iOS */
}

As you can see it easier to link the font as you do not have to write extensive CSS and upload the 5 font files (eot, woff2, woff, ttf & svg) that font embedding requires.

Font linking is not allowed

Font linking does not work for offline content. It requires requests to other services, in contradiction to font embedding. Font linking may cause uptime worries, dependency issues (Great firewall of China) and leaking of Personally Identifiable Information (PII). In some countries (like the Netherlands) it is even forbidden by law to share PII (like IP address and user-agent string) without an explicit consent from the user to allow tracking. So, it is a simple choice, one would think, right?

Font embedding is also not allowed

Services like Fonts.com, MyFonts, Typekit, etc. do not allow font embedding, you need to link them. The reason: they have a “pay-per-use” business model. But isn’t it a bit strange that this type of usage (enforced by the licensing model) is actually restricted by EU privacy laws? Exception is Google Fonts as their fonts are free to use and free to embed.

It’s-a me, Mario! Let’s-a go!

Mario Ranftl (majodev) has created an extremely useful google-webfonts-helper (hosted on Heroku). If you want to know how you can find the source on Github (collecting stars). It makes it very easy to self-host your fonts. The steps:

  1. Go to: https://google-webfonts-helper.herokuapp.com/fonts
  2. Select one of the 682 fonts from the menu on the left
  3. Copy-paste presented CSS code into your stylesheet in the directory “css”
  4. Download the zip file using the big blue button
  5. Unzip the files and upload them to your website in the directory “fonts”

Thank you Mario, that is super! Alternatively, if you have your own fonts and need them in such a convenient zip file, you may try fontsquirrel.com’s Webfont Generator. Let me know how you like these tools (or if you know any better) using the comments. Also, check out the discussion on Hacker News!

Now let’s start using fonts responsibly!

Share

PHP asset proxy increases website availability

remote_assets

Don’t you hate it when your site does not work, because you linked jQuery from “code.jquery.com” and that site is suffering connection problems? This may also happen with stylesheets or with font files. To counter this problem (but not lose the convenience of remote loaded assets) I created  an “asset proxy” in PHP. It will cache the assets in a cache folder on your web server, so that you do not have to worry about downtime of other services. You can configure how often the cache should be refreshed. When the external source is not available during a refresh the stale cache files will be used and there is no downtime at all!

proxy_assets

Install asset-proxy.php in your webroot. Then replace all references in your HTML from:

 href="http://fonts.googleapis.com/css?family=Droid+Sans:400,700"

to:

 href="/asset-proxy.php/fonts.googleapis.com/css?family=Droid+Sans:400,700"

Make sure you edit the list of allowed hostnames in the header of the PHP file and that you set an appropriate refresh time (in seconds). If the assets are not available upon refresh the stale files are served.

// hostnames for which "GET" requests can be proxied over "HTTP" (no ssl)
$hostnames = array(
	'fonts.gstatic.com',
	'maxcdn.bootstrapcdn.com',
	'netdna.bootstrapcdn.com',
	'fonts.googleapis.com',
	'ajax.googleapis.com',
);

// maximum age of a file before being refreshed
$refresh_age = 24*3600;

// directory where the cache resides (should exist and not be served)
$cache_dir = '/tmp/cache';

// strip the leading "/proxy.php/" from the URL
$url = substr($_SERVER['REQUEST_URI'], strlen($_SERVER['SCRIPT_NAME'].'/'));

// if there is no URL specified show bad request error
if(!$url || !strpos($url,'/')){
	header('Bad Request', true, 400);
	exit;
}

// get the hostname which should be the first segment (until the first slash)
$hostname = substr($url, 0, strpos($url, '/'));

// if the hostname is not in the list of allowed hostnames show forbidden error
if (!in_array($hostname, $hostnames)) {
	header('Forbidden', true, 403);
	exit;
}

// calculate the cached filename and check whether it already exists
$filename = $cache_dir.'/'.md5($url);
$file_exists = file_exists($filename);

// get the file age if the file exists
if ($file_exists) {
	$file_age = time()-filemtime($filename);
}

// if cache exists and is fresh, let's read the file, else retrieve it with cURL
if ($file_exists && $file_age<$refresh_age) {
	$result = file_get_contents($filename);
} else {
	// set some headers on the cURL call to pretend we are a user
	$sent_headers = array();
	foreach (array('User-Agent','Accept','Accept-Language','Referer') as $header) {
		$key = 'HTTP_'.strtoupper(str_replace('-','_',$header));
		if (isset($_SERVER[$key])) {
			$sent_headers[] = $header.': '.$_SERVER[$key];
		}
	}

	// make sure we do net get chunked, deflated or gzipped content
	$sent_headers[] = 'Accept-Encoding: ';
	$sent_headers[] = 'Cache-Control: max-age=0';
	$sent_headers[] = 'Connection: keep-alive';

	// initialize cURL with the URL, our headers and set headers retrieval on
	$curl = curl_init('http://'.$url);
	curl_setopt_array($curl, array(
			CURLOPT_HEADER => true,
			CURLOPT_RETURNTRANSFER => true,
			CURLOPT_BINARYTRANSFER => true,
			CURLOPT_HTTPHEADER => $sent_headers
	));

	// execute cURL call and get status code
	$result = curl_exec($curl);
	$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
	curl_close($curl);

	if ($status == 200) {
		// file was successfully retrieved
		if (file_put_contents($filename, $result)===false) {
			// show error on unsuccessful write
			header('Internal Server Error', true, 500);
			exit;
		}
	} else if ($file_exists) {
		// serve stale
		$result = file_get_contents($filename);
		// reset refresh timer
		touch($filename);
	}

}

// split the message in raw headers and body
if (strpos($result,"\r\n\r\n")!==false) {
	list($raw_headers,$body) = explode("\r\n\r\n", $result, 2);
} else {
	list($raw_headers,$body) = array($result,'');
}

// convert raw headers into an array
$raw_headers = explode("\n", $raw_headers);

// parse raw headers into received headers
$received_headers = array();
foreach ($raw_headers as $h) {
	$h = explode(':', $h, 2);
	if (isset($h[1])) {
		$received_headers[$h[0]] = trim($h[1]);
	}
}

// set certain headers for the output
$headers = array('Content-Type','Content-Encoding','Cache-Control','ETag','Last-Modified','Vary');
foreach ($headers as $header) {
	if (isset($received_headers[$header])) {
		header($header.': '.$received_headers[$header]);
	}
}

// replace the absolute URL's in the output
foreach ($hostnames as $hostname) {
	$body = preg_replace('/(https?:)?\/\/'.str_replace('.','\.',$hostname).'\//',
		$_SERVER['SCRIPT_NAME'].'/'.$hostname.'/', $body);
}

// set the new content length properly
header('Content-Length: '.strlen($body));

// echo the contents of the body
echo $body;

Best thing since sliced bread.. 😉 And only 128 lines of PHP code! Source code is on Github:

https://github.com/mevdschee/asset-proxy.php

 

Share