Analyzing text protocols with a TCP proxy

To debug the Memcache server (memcached) on my localhost I needed an application that would log the in and output to a specific port on my local machine. In  the config in PHP I changed the memcache port (from 11211) to 11212. Then I ran the TCP proxy software to forward all connections on port 11212 to 11211. I know I could have used wireshark (and/or tcpdump) to just look at the traffic, but I felt like trying a more light-weight tool. I ran into two nice little TCP proxy programs: One written in Perl and one written in C++ with the Boost library.

Perl based TCP proxy

Torsten Raudssus wrote an application that has the following output:


With the following commands you can run the TCP proxy:

git clone
cd p5-app-tcpproxy/
sudo apt-get install cpanminus
sudo cpanm --installdeps .
perl bin/ 11212 11211

It is a lovely little project that definitely serves a niche. The only downside I see is that is has two CPAN dependencies. Perl is always available on Linux, but the dependencies certainly not. This makes it maybe a little harder to run when you are on a random machine debugging a nasty problem. If it would not have those it would become my next favorite (text) protocol analyzer.

C++ (Boost) based TCP proxy

Arash Partow also wrote a TCP proxy application in C++. It originally had no output, but I changed that. After my modifications the output looks like this:

56258 ==> delete memc.sess.key.lock.s5p5eh8fhvfe6iq06ot6nuim66

56258 <== DELETED

56258 ==> quit

56276 ==> add memc.sess.key.lock.s5p5eh8fhvfe6iq06ot6nuim66 0 1421275851 1

56276 <== STORED

56276 ==> get memc.sess.key.s5p5eh8fhvfe6iq06ot6nuim66

56276 <== END

56276 ==> set memc.sess.key.s5p5eh8fhvfe6iq06ot6nuim66 0 4 1331

With the following commands you can run the TCP proxy:

git clone
cd proxy/
sudo apt-get install build-essential
sudo apt-get install libboost-all-dev
./tcpproxy_server 11212 11211

As you can see I forked the original Github project to add some logging:


It works like a charm, but I prefer the colored and smarter output of the Perl application. If I find some time I might copy the way the Perl proxy shows the output to C++. If you feel like it and you think you know how to do that, than I would be very happy with a PR!

NB: Arash Partow also wrote a variation to the code that does logging, it is on Google Code.

What is your favorite TCP proxy for analyzing text protocols? Let us know in the comments!


PHP asset proxy increases website availability


Don’t you hate it when your site does not work, because you linked jQuery from “” and that site is suffering connection problems? This may also happen with stylesheets or with font files. To counter this problem (but not lose the convenience of remote loaded assets) I created  an “asset proxy” in PHP. It will cache the assets in a cache folder on your web server, so that you do not have to worry about downtime of other services. You can configure how often the cache should be refreshed. When the external source is not available during a refresh the stale cache files will be used and there is no downtime at all!


Install asset-proxy.php in your webroot. Then replace all references in your HTML from:




Make sure you edit the list of allowed hostnames in the header of the PHP file and that you set an appropriate refresh time (in seconds). If the assets are not available upon refresh the stale files are served.

// hostnames for which "GET" requests can be proxied over "HTTP" (no ssl)
$hostnames = array(

// maximum age of a file before being refreshed
$refresh_age = 24*3600;

// directory where the cache resides (should exist and not be served)
$cache_dir = '/tmp/cache';

// strip the leading "/proxy.php/" from the URL
$url = substr($_SERVER['REQUEST_URI'], strlen($_SERVER['SCRIPT_NAME'].'/'));

// if there is no URL specified show bad request error
if(!$url || !strpos($url,'/')){
	header('Bad Request', true, 400);

// get the hostname which should be the first segment (until the first slash)
$hostname = substr($url, 0, strpos($url, '/'));

// if the hostname is not in the list of allowed hostnames show forbidden error
if (!in_array($hostname, $hostnames)) {
	header('Forbidden', true, 403);

// calculate the cached filename and check whether it already exists
$filename = $cache_dir.'/'.md5($url);
$file_exists = file_exists($filename);

// get the file age if the file exists
if ($file_exists) {
	$file_age = time()-filemtime($filename);

// if cache exists and is fresh, let's read the file, else retrieve it with cURL
if ($file_exists && $file_age<$refresh_age) {
	$result = file_get_contents($filename);
} else {
	// set some headers on the cURL call to pretend we are a user
	$sent_headers = array();
	foreach (array('User-Agent','Accept','Accept-Language','Referer') as $header) {
		$key = 'HTTP_'.strtoupper(str_replace('-','_',$header));
		if (isset($_SERVER[$key])) {
			$sent_headers[] = $header.': '.$_SERVER[$key];

	// make sure we do net get chunked, deflated or gzipped content
	$sent_headers[] = 'Accept-Encoding: ';
	$sent_headers[] = 'Cache-Control: max-age=0';
	$sent_headers[] = 'Connection: keep-alive';

	// initialize cURL with the URL, our headers and set headers retrieval on
	$curl = curl_init('http://'.$url);
	curl_setopt_array($curl, array(
			CURLOPT_HEADER => true,
			CURLOPT_HTTPHEADER => $sent_headers

	// execute cURL call and get status code
	$result = curl_exec($curl);
	$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);

	if ($status == 200) {
		// file was successfully retrieved
		if (file_put_contents($filename, $result)===false) {
			// show error on unsuccessful write
			header('Internal Server Error', true, 500);
	} else if ($file_exists) {
		// serve stale
		$result = file_get_contents($filename);
		// reset refresh timer


// split the message in raw headers and body
if (strpos($result,"\r\n\r\n")!==false) {
	list($raw_headers,$body) = explode("\r\n\r\n", $result, 2);
} else {
	list($raw_headers,$body) = array($result,'');

// convert raw headers into an array
$raw_headers = explode("\n", $raw_headers);

// parse raw headers into received headers
$received_headers = array();
foreach ($raw_headers as $h) {
	$h = explode(':', $h, 2);
	if (isset($h[1])) {
		$received_headers[$h[0]] = trim($h[1]);

// set certain headers for the output
$headers = array('Content-Type','Content-Encoding','Cache-Control','ETag','Last-Modified','Vary');
foreach ($headers as $header) {
	if (isset($received_headers[$header])) {
		header($header.': '.$received_headers[$header]);

// replace the absolute URL's in the output
foreach ($hostnames as $hostname) {
	$body = preg_replace('/(https?:)?\/\/'.str_replace('.','\.',$hostname).'\//',
		$_SERVER['SCRIPT_NAME'].'/'.$hostname.'/', $body);

// set the new content length properly
header('Content-Length: '.strlen($body));

// echo the contents of the body
echo $body;

Best thing since sliced bread.. 😉 And only 128 lines of PHP code! Source code is on Github:



Tutorial: Apache 2.4 as reverse proxy

This post explains how to configure Apache 2.4 (the version that comes with Ubuntu 14.04) as a fully transparent reverse proxy. If you have a single website that has multiple paths that are actually run by different web applications then this tutorial may be for you.


The proxy will serve both web applications from their own virtual host configuration. These may be on the same machine as shown below using the loop-back addresses and or on different machines if you use their (internal) IP addresses.

App1: =
App2: =

This is the directory structure in which I want to load the various web apps:

maurits@nuc:/var/www/html$ ll
total 28
drwxr-xr-x 4 root root  4096 Dec  1 21:43 ./
drwxr-xr-x 3 root root  4096 Apr 21  2014 ../
-rw-r--r-- 1 root root 11510 Apr 21  2014 index.html
drwxr-xr-x 2 root root  4096 Dec  1 21:45 app1/
drwxr-xr-x 2 root root  4096 Dec  1 21:45 app2/

In this tutorial we run the web applications on the same paths as on the proxy. This means that the web apps run in a subdirectory, even on the machines behind the proxy. This avoids the need of rewriting and thus keeps this setup simple and easy to debug.

Setting up the reverse proxy in Apache 2.4

What we are going to do is setup a reverse proxy. First we load the “proxy_http” module in Apache 2.4 using:

sudo a2enmod proxy_http
sudo service apache2 restart

Let’s setup the reverse proxy virtual host configuration in “/etc/apache2/sites-available/yourwebsite-proxy.conf” like this:

<VirtualHost *:80>
DocumentRoot /var/www/html
ProxyPreserveHost On
ProxyPass /app1
ProxyPass /app2

The virtual host configuration of app1 in “/etc/apache2/sites-available/yourwebsite-app1.conf” looks like this:

DocumentRoot /var/www/html

And the virtual host configuration of app2 in “/etc/apache2/sites-available/yourwebsite-app2.conf” looks like this:

DocumentRoot /var/www/html

Lets enable all sites and reload Apache using:

sudo a2ensite yourwebsite-proxy yourwebsite-app1 yourwebsite-app2
sudo service apache2 reload

Note that this works as the virtual host configurations with a specified IP address will be matched first. The “ProxyPreserveHost” will make sure the “Host” header in the request is not rewritten. The lack of a “ProxyPassReverse” will make sure that there is no rewriting done on the response.

Showing the correct remote IP address

It is important to understand that in the above setup, the proxied web application will only see a different “REMOTE_ADDR” environment variable, since there is absolutely no rewriting going on. The real visitor address is passed along in “X-Forwarded-For” header. This is a comma separated list and the last entry holds the real client IP address.

If you are on Apache 2.4, like in Ubuntu 14.04, you can correct the reported remote address by loading the “remoteip” module like this:

sudo a2enmod remoteip
sudo service apache2 restart

Add the “RemoteIPHeader” and “RemoteIPInternalProxy” directives to the virtual host configurations:

DocumentRoot /var/www/html
RemoteIPHeader X-Forwarded-For

Note that the “RemoteIPInternalProxy” you must specify the internal IP address of the proxy. To test if you did it right you can run a PHP script that calls “phpinfo()”. If you see that the “REMOTE_ADDR” value is not set to the proxy, then it is working.

Adding headers to the upstream request

We want to make Apache2 add upstream headers and therefor we need to load the “headers” module in Apache 2.4 using:

sudo a2enmod headers
sudo service apache2 restart

Next, we have to adjust the reverse proxy virtual host configuration in “/etc/apache2/sites-available/yourwebsite-proxy.conf” like this:

<VirtualHost *:80>
DocumentRoot /var/www/html
ProxyPreserveHost On
RewriteEngine On
RequestHeader add X-SSL off
RewriteRule ^/app1/(.*)$1 [P,L]
RewriteRule ^/app2/(.*)$1 [P,L]

In this example we add a “X-SSL” header with the value “off” to the proxied request. If you want to add headers to the response you can use the “Header” directive.

If you have any questions, please use the comments below.


Automated tests for SOAP API in Java

Don’t think the story will be short, but it’s definitely interesting. So, I got a new project where I need to cover a SOAP API with automated tests written on Java. As a starting point I have a URL to a WSDL file to work with. The “tricky” part is in the environment configuration. Our test environment is hidden behind a SOCKS proxy server. So, if you want to work with a test environment, you need to make your requests through that proxy, otherwise you’ll “talk” to another environment that we don’t need right now. Not a usual configuration, but it had it’s own reasons for doing so. My first problem was to decide how to work with WSDL and in what direction my future framework would go. After some Googling and trials, I stopped on the JAX-WS library. This is handy because with the “maven wsimport” task, you can easily generate all needed classes and objects where your stubs will be held. Here’s how it looks like in pom.xml:

                                <!-- Without this, multiple WSDLs won't be processed -->
                                <!-- Without this, multiple WSDLs won't be processed -->

It was a bit hard for me to get the WSDL file from the URL, so I decided to save it locally and read it from there then. Since changes on the API are not frequent, I can easily accept this. This configuration allows to add as many WSDL sources as I want and to store stubs for each of them in project. First problem was solved. Now, I had to figure out how to make my framework to use proxy for it’s calls. My first trial was basic and most common in the Java world:

System.getProperties().put("proxySet", "true");
System.getProperties().put("proxyHost", getProxyHost());
System.getProperties().put("proxyPort", getProxyPort());

But I kept getting errors and no successful connection. My next guess was that problem could be in the SSL, since we are using HTTPS. So, I started the browser, imported the security certificates (they were self-signed since it’s a test environment), added them to java “cacerts” keystore, and added next to my code:

System.setProperty("", "keystore.jks");
System.setProperty("", "cacerts.jks");
System.setProperty("", "changeit");

I tried to debug and to make sure that the JVM is using my proxy settings and it was true – everything was there. What would be the next step in debugging for me? I decided to install a local proxy and try to put my requests through it, so I could get more details. I installed Charles – a fantastic proxy server that helped me a lot in the past with REST API automation. It has a life changing ( for me in this case) “External Proxy” option . I set it to proxy all requests to our SOCKS proxy and I put my Java API calls through Charles as well. And it worked! You can’t even believe how happy I was because of it. But now I got another problem – Charles cost $50 for one license. So, there were two questions that I needed to solve:

  • Why my Java calls works fine with Charles and without it they fail?
  • Where to find free and easy to setup HTTP proxy?

With help from one of our developers we could find out why Java kept getting authentication errors and didn’t want to connect to test environment properly. The problem was that Java was connecting to the SOCKS proxy properly, but it was resolving remote DNS to another environment to which I didn’t have proper certificates and credentials for authorization! I found only one article on StackOverflow regarding this issue and from that point I new that I need to have local HTTP proxy that is:

  • free
  • can forward my requests to SOCKS proxy

I’ve spent whole day trying to find an easy-to-use and adequate proxy server for Windows ( it’s my Workstation that I need to use), I looked over 15 different apps and failed. In the end, I installed VirtualBox with Ubuntu and install there Squid3 proxy server. It’s the most popular and common proxy server for Unix, from what I saw on Google. But this crazy application has a configuration file with more then 3000 lines! It’s not that easy to make it run as I want, it’s not even that easy to restart it. So, after couple hours I gave up on it and started looking for more solutions. Luckily, I found Polipo – tiny, easy to install and setup proxy server. It only has a few main options that I had to setup and make everything work as a charm:

  • proxyAddress – set it to IP address of your local virtual box, or your local machine, if you use Linux or Mac
  • allowedClients – list of IP addresses from which Polipo will allow to access it and forward requests towards another proxy or web directly
  • socksParentProxy = “host name and port of your proxy to which I needed to forward my requests”
  • socksProxyType = socks5

Save changes and restart – that’s it! After that I pointed my Java framework to the local proxy and got green tests! To set the proxy for my tests I used custom MyProxySelector class:

                      package Base;

                      import java.util.ArrayList;
                      import java.util.HashMap;

                      public  class MyProxySelector extends ProxySelector {
                          // Keep a reference on the previous default
                          public ProxySelector defsel = null;

                           * Inner class representing a Proxy and a few extra data
                          class InnerProxy {
                              Proxy proxy;
                              SocketAddress addr;
                              // How many times did we fail to reach this proxy?
                              int failedCount = 0;

                              InnerProxy(InetSocketAddress a) {
                                  addr = a;
                                  proxy = new Proxy(Proxy.Type.HTTP, a);

                              SocketAddress address() {
                                  return addr;

                              Proxy toProxy() {
                                  return proxy;

                              int failed() {
                                  return ++failedCount;

                           * A list of proxies, indexed by their address.
                          HashMap<SocketAddress, InnerProxy> proxies = new HashMap<SocketAddress, InnerProxy>();

                          public MyProxySelector(ProxySelector def, String host, int port) {
                              // Save the previous default
                              defsel = def;

                              // Populate the HashMap (List of proxies)
                              InnerProxy I = new InnerProxy(new InetSocketAddress(host, port));
                              proxies.put(i.address(), i);

                           * This is the method that the handlers will call.
                           * Returns a List of proxy.
                          public java.util.List select(URI uri) {
                              // Let's stick to the specs.
                              if (uri == null) {
                                  throw new IllegalArgumentException("URI can't be null.");
                               * If it's a http (or https) URL, then we use our own
                               * list.
                              String protocol = uri.getScheme();
                              if ("http".equalsIgnoreCase(protocol) ||
                                      "https".equalsIgnoreCase(protocol)) {
                                  ArrayList l = new ArrayList();
                                  for (InnerProxy p: proxies.values()) {
                                  return l;

                               * Not HTTP or HTTPS (could be SOCKS or FTP)
                               * defer to the default selector.
                              if (defsel != null) {
                              } else {
                                  ArrayList l = new ArrayList();
                                  return l;

                           * Method called by the handlers when it failed to connect
                           * to one of the proxies returned by select().
                          public void connectFailed(URI uri, SocketAddress sa, IOException ioe) {
                              // Let's stick to the specs again.
                              if (uri == null || sa == null || ioe == null) {
                                  throw new IllegalArgumentException("Arguments can't be null.");

                               * Let's lookup for the proxy
                              InnerProxy p = proxies.get(sa);
                              if (p != null) {
                                       * It's one of ours, if it failed more than 3 times
                                       * let's remove it from the list.
                                  if (p.failed() >= 3)
                              } else {
                                       * Not one of ours, let's delegate to the default.
                                  if (defsel != null)
                                      defsel.connectFailed(uri, sa, ioe);

And to turn the proxy on and off I wrote next switch methods:

            private ProxySelector defaultProxy = ProxySelector.getDefault();

            public void setLocalProxy(){
                MyProxySelector ps = new MyProxySelector(ProxySelector.getDefault(),

            public void disableProxy(){

That’s it. Now I can run my tests easily with control over when to the use the proxy and when not to. In the future I’ll move my tests to the CI server (Jenkins most probably) and will setup Polipo HTTP proxy on that environment in two minutes. It’s always nice to solve such non-ordinary problems. Most probably my solution is not very elegant and “right”, but it works right now and for this moment this is all that matters to me, since I can start writing automated tests rather then fighting with configuration issues.


POC: Flexible PHP Output Caching

I started my pet-project almost a year ago, developed it in my free time and it is time to write an article about it. I also released this project under Apache 2.0 license.(

Last year at the Symfony conference in Paris I have heard a really good quote:

“There are only two hard things in Computer Science: cache invalidation and naming things” — Phil Karlton. I agree with it and it gave me a boost to keep evolving the concept.

Addressed Audience

This article is for software developers and people who can tweak code. It is also for people who had or will have performance issues. This software will help you to solve these issues with minimal effort.

I don’t want you to get bored, so I will lead you to the examples, if you want to know more technical details you can find it after the following topic.


Unfortunately I cannot cover all the features framework provides in detail here but I can give you a brief introduction to their usage through a few examples. Let’s take the first functional tests in the framework as the first example. Please be aware though that this is a quickly evolving project and the examples seen/given here might not work with these parameters in the near future as the parameters are also subject to change. So please always refer to the documentation and examples shipped with the framework.

The functional tests do not cover all the implemented features but can be used to prove that the main functionality (caching) is working on the webserver. The tests can be found in the root of the project at the “/functionaltests” folder. The following example code does not contain inline comments because I am going to explain the code in detail.

Let’s have a look at cache_filecache.php

use POC\Poc;
use POC\cache\cacheimplementation\FileCache;


$poc  = new Poc(array(Poc::PARAM_CACHE => new FileCache(), Poc::PARAM_DEBUG => true));



The first thing you notice looking at the code snippet above is the way parameters are given to the constructor of the Poc class. Because many parameters can be changed in the classes shipped with the project, I decided to build a more flexible way of handling parameters than the one used in PHP. The user has to pass an array to these objects where the index of the array is the name of the parameter and the value of course is the value of the parameter. If a parameter is not defined in the array the framework will use the builtin default value for that parameter. As the default cache engine is FileCache we could have omitted the PARAM_CACHE parameter in the previous example and define the $poc variable like this:

$poc  = new Poc(array(Poc::PARAM_DEBUG => true));

This is a really easy scenario where your application is mocked by the “/lib/text_generator.php” file. This is at the moment a Lorem Ipsum generator. We cache its contents – simply by creating the Poc object – as we can see in the example. We store the generated caches for 5 seconds – it is the default value – and we also want to see some performance information at the end of the cached text so we turned on debugging. We achieve this by adding the last parameter with a “true” value. The Hasher class is an important part of the concept. Let me describe it in the following example.

In this example we used the FileCache engine for caching, but by changing only a few characters we can use  “MemcachedCache”, “RediskaCache”, “MongoCache”, etc. So, it is really easy to implement new caching engines to the project.

More complex Example

Let’s get a closer look at an everyday example. That’s why I took the Symfony Jobeet tutorial for describing the usage more closely. I have copied my framework to the lib/vendors folder and also created the poc.php file in the config folder. This file needs to be included at the very beginning of the application. For now we put it at the first line of the file:  config/ProjectConfiguration.class.php

Let me reveal to you the contents of the file poc.php.

use POC\cache\filtering\Hasher;
use POC\cache\filtering\Filter;
use POC\Poc;
use POC\cache\cacheimplementation\FileCache;

$hasher = new Hasher();

$conditions = new Filter();

$cache = new FileCache(array(FileCache::PARAM_TTL=>300,

$poc  = new Poc(array(Poc::PARAM_CACHE=>$cache, Poc::PARAM_DEBUG=>true));

As you can see, the basics of this case are really similar to the previous example. But here we utilize some other features as well.

By calling the addDistinguishVariable function we will make our cache unique. Those variables get stored into an array that you add to it as a parameter, then at one point the array will be serialized and a hashkey will be generated from it. This hash will identify your cache. You have to find the variables that can identify the state of your application and add those to this function, it is that simple! As I examined the Jobeet application the $_GET, and the Request URI values define the state of the application on the level we want to use the cache. This means that authenticated pages are not cached in this case.

now we have reached the next piece of code that calls the addBlacklistCondition function. This stores logical values regarding the current state of your application. If any of those are true the page will not be involved in the caching process. Here we have defined if there is a POST action or if the URL contains any of the “backend”,” job”, “affiliate” words the caching is not applied.

Easy, right? There is no need for more explanation; it just works out of box. Maybe I have not covered all possible blacklist states in this example, but you know the basic software.


The Engine is really small; it does not contain more than 2400 lines of code (excluding unittests and vendors), and it does not do any “black magic”. The total overhead should be less than one millisecond. If there is a cache hit, it is likely to get the output to your machine within a few milliseconds. If you measure the process before the moment the output is pushed to the client, you can see that the cache engine in most cases gets you the page under 1 millisecond. (Note that the actual performance might depend on the cache engine you employ and the environment you run the software on.)The performance is really promising, way much faster as I thought it will be when I started the project, but for now what I can say is that in a next article you will see proper benchmarks as well, so stay tuned!

Constraints and concepts

I wanted to use the latest programming methods so I decided to support only PHP 5.3 and above. Some of the several concepts and methodologies the project uses are the following:

  • namespaces
  • Continuous integration (Jenkins)
  • Dependency injection

Of course it had to be fast so I didn’t want to rely on external frameworks but used the built in PHP functionality where it was possible.


With this framework you can customize the caching settings to a great extent. Let me list some of these options:

  • Output caching based on user defined criteria
  • Cache invalidation by TTL
  • Blacklisting / cache invalidation by application state
  • Blacklisting by output content
  • For caching it utilizes many interfaces, such as:
    • Memcached
    • Redis
    • MongoDb
    • Its own filesystem based engine.
    • APC (experimental, performs and works well on a webserver, but unfortunately the CLI interface does not behave like it should and it cannot be unit tested properly so I don’t include it in the master branch)
  • For cache tagging it utilizes MySQL but more database engines will be added
  • Cache Invalidation by tags
  • Minimal overhead on the performance
  • Easy to turn on/off
  • Controls the headers

Planned features

As the framework is still in an early state, many new features will be implemented in the future. These include the following:

  • Edge side includes
  • Cache templating with Twig
  • statistics stored in database
  • And many more

If you have any questions and or suggestions, feel free to leave a comment!