PHP asset proxy increases website availability

remote_assets

Don’t you hate it when your site does not work, because you linked jQuery from “code.jquery.com” and that site is suffering connection problems? This may also happen with stylesheets or with font files. To counter this problem (but not lose the convenience of remote loaded assets) I created  an “asset proxy” in PHP. It will cache the assets in a cache folder on your web server, so that you do not have to worry about downtime of other services. You can configure how often the cache should be refreshed. When the external source is not available during a refresh the stale cache files will be used and there is no downtime at all!

proxy_assets

Install asset-proxy.php in your webroot. Then replace all references in your HTML from:

 href="http://fonts.googleapis.com/css?family=Droid+Sans:400,700"

to:

 href="/asset-proxy.php/fonts.googleapis.com/css?family=Droid+Sans:400,700"

Make sure you edit the list of allowed hostnames in the header of the PHP file and that you set an appropriate refresh time (in seconds). If the assets are not available upon refresh the stale files are served.

// hostnames for which "GET" requests can be proxied over "HTTP" (no ssl)
$hostnames = array(
	'fonts.gstatic.com',
	'maxcdn.bootstrapcdn.com',
	'netdna.bootstrapcdn.com',
	'fonts.googleapis.com',
	'ajax.googleapis.com',
);

// maximum age of a file before being refreshed
$refresh_age = 24*3600;

// directory where the cache resides (should exist and not be served)
$cache_dir = '/tmp/cache';

// strip the leading "/proxy.php/" from the URL
$url = substr($_SERVER['REQUEST_URI'], strlen($_SERVER['SCRIPT_NAME'].'/'));

// if there is no URL specified show bad request error
if(!$url || !strpos($url,'/')){
	header('Bad Request', true, 400);
	exit;
}

// get the hostname which should be the first segment (until the first slash)
$hostname = substr($url, 0, strpos($url, '/'));

// if the hostname is not in the list of allowed hostnames show forbidden error
if (!in_array($hostname, $hostnames)) {
	header('Forbidden', true, 403);
	exit;
}

// calculate the cached filename and check whether it already exists
$filename = $cache_dir.'/'.md5($url);
$file_exists = file_exists($filename);

// get the file age if the file exists
if ($file_exists) {
	$file_age = time()-filemtime($filename);
}

// if cache exists and is fresh, let's read the file, else retrieve it with cURL
if ($file_exists && $file_age<$refresh_age) {
	$result = file_get_contents($filename);
} else {
	// set some headers on the cURL call to pretend we are a user
	$sent_headers = array();
	foreach (array('User-Agent','Accept','Accept-Language','Referer') as $header) {
		$key = 'HTTP_'.strtoupper(str_replace('-','_',$header));
		if (isset($_SERVER[$key])) {
			$sent_headers[] = $header.': '.$_SERVER[$key];
		}
	}

	// make sure we do net get chunked, deflated or gzipped content
	$sent_headers[] = 'Accept-Encoding: ';
	$sent_headers[] = 'Cache-Control: max-age=0';
	$sent_headers[] = 'Connection: keep-alive';

	// initialize cURL with the URL, our headers and set headers retrieval on
	$curl = curl_init('http://'.$url);
	curl_setopt_array($curl, array(
			CURLOPT_HEADER => true,
			CURLOPT_RETURNTRANSFER => true,
			CURLOPT_BINARYTRANSFER => true,
			CURLOPT_HTTPHEADER => $sent_headers
	));

	// execute cURL call and get status code
	$result = curl_exec($curl);
	$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
	curl_close($curl);

	if ($status == 200) {
		// file was successfully retrieved
		if (file_put_contents($filename, $result)===false) {
			// show error on unsuccessful write
			header('Internal Server Error', true, 500);
			exit;
		}
	} else if ($file_exists) {
		// serve stale
		$result = file_get_contents($filename);
		// reset refresh timer
		touch($filename);
	}

}

// split the message in raw headers and body
if (strpos($result,"\r\n\r\n")!==false) {
	list($raw_headers,$body) = explode("\r\n\r\n", $result, 2);
} else {
	list($raw_headers,$body) = array($result,'');
}

// convert raw headers into an array
$raw_headers = explode("\n", $raw_headers);

// parse raw headers into received headers
$received_headers = array();
foreach ($raw_headers as $h) {
	$h = explode(':', $h, 2);
	if (isset($h[1])) {
		$received_headers[$h[0]] = trim($h[1]);
	}
}

// set certain headers for the output
$headers = array('Content-Type','Content-Encoding','Cache-Control','ETag','Last-Modified','Vary');
foreach ($headers as $header) {
	if (isset($received_headers[$header])) {
		header($header.': '.$received_headers[$header]);
	}
}

// replace the absolute URL's in the output
foreach ($hostnames as $hostname) {
	$body = preg_replace('/(https?:)?\/\/'.str_replace('.','\.',$hostname).'\//',
		$_SERVER['SCRIPT_NAME'].'/'.$hostname.'/', $body);
}

// set the new content length properly
header('Content-Length: '.strlen($body));

// echo the contents of the body
echo $body;

Best thing since sliced bread.. 😉 And only 128 lines of PHP code! Source code is on Github:

https://github.com/mevdschee/asset-proxy.php

 

Share

10 thoughts on “PHP asset proxy increases website availability”

  1. Hello Maurits,
    That’s an interesting article, but this makes no sense when thinking about CDN, doesn’t it?

  2. @Daniel: You are right when you use a CDN for serving all your pages and assets. If you use a CDN only for assets, then this does make sense, as it increases your availability.

  3. It will decrease the speed of the site and I’m sure about the availability.
    The chance your web server with the “asset proxy” to go down is much higher than the downtime of the CDN.
    The last thing is the parallel requests of the browsers. If you merge the content from a multiple domains to a single domain, the browser will send from 4 to 6 (it depends from the browser) parallel requests per host and all other will wait.
    I do NOT recommend to anybody to use this piece of code on production. Yes, it is nice to learn how the proxies and CDN’s works, but nothing more.

  4. @Boyan: Thank you for the compliments and I am glad it helps people to learn about CDN. If your website is down, it does not matter whether or not your assets on the CDN are available, but I’m sure you get that. Good point on the speed. Actually most CDN’s deliver the assets much faster than you can serve them yourself.

  5. I don’t get this either. The whole point of having a CDN is offloading the resource from the web server to a POP closer to the end user. By proxying it through your own server you effectively invalidate this. Why not just host the file yourself in this case?

    Furthermore each request to asset-proxy.php causes a file access, as the file age is checked, creating unnecessary latency. Expires headers are also apparently dropped (only Content-Type is forwared), so the browser won’t even cache the resource.

    Interesting concept, but in this use case it doesn’t make any sense.

  6. @Jari: Great comments! Thank you for your good remarks. I added some CDN cache headers to enable proper browser caching. I also agree for 100% with all your other points. It would be best to just host the files yourself or send all data through the CDN as both would improve the performance. I’m happy you think it is interesting. In case you find a use case that does make sense, then please let me know!

  7. I’m agree with other people that comment before .
    when you use it , it reduce speed of loading and reduce chance of using cached asset for another websites that contained it.
    Also you should check if data isn’t gzipped , you should gzipped it . it’s necessary for assets like jquery !

  8. @Roohollah: Thank you for your comment. I also agree with those comments. It is only in rare occasions that high availability is more important than speed.

Leave a Reply

Your email address will not be published. Required fields are marked *