Symfony2 Memcache session locking

In one of the previous posts we wrote about session reliability. Today we will talk about “locking session data”. This is another session reliability topic and we will look at the problems that may occur in Symfony2 and how to solve them.

Session locking

Session locking is when the web server thread acquires an exclusive lock on the session data to avoid concurrent access. Browsers use HTTP 1.1 keep-alive and would normally just use one open TCP connection and reuse that to get all dynamic content. When loading images (and other static content) the browser may decide to use multiple TCP connections (concurrent) to get the data as fast as possible. This also happens when using AJAX. This may (and will most likely) lead to different workers (threads) on the web server answering these concurrent requests concurrently.

Each of the requests may read the session data update and write it back. The last write wins, so some writes may get lost. This can be countered by applying session locking. The session lock will prevent race conditions from occurring and prevent any corrupted data appearing in the session. This can easily be understood by looking at the following two images.

session-access-without-lockingsession-access-with-locking

The left image shows concurrent requests without session locking and the right shows concurrent requests with session locking. This is very well described in this post by Andy Bakun. Note that the above images are also from that post. Reading the Andy Bakun post allows you to truly understand the session locking problem (and the performance problems that AJAX may cause).

Symfony2 sessions

In Symfony2 one would normally use the NativeFileSessionHandler, which will just use the default PHP session handler. This works flawless in most cases. PHP uses “flock” to acquire an exclusive lock on the local filesystem. But when you scale out and run a server farm with multiple web servers you cannot use the local filesystem. You might be using a shared (NFS) filesystem and run into problems with “flock” (see the Linux NFS FAQs). If you use the database you may run into performance problems, escpecially when applying locking.

This leaves Memcache or Redis as options for session storage. These are fast key/value stores that can be used for session storage. Unfortunately the Symfony2 session storage implementations for Memcache (in Symfony) and Redis (in phpredis) do not implement session locking. This potentially leads to problems, especially when relying on AJAX calls as explained above. Note that other frameworks (like CakePHP) also do not implement session locking when using Memcache as session storage. Edit: This post has inspired the guys from SncRedisBundle and this Symfony2 bundle now supports session locking, which is totally awesome!

Custom save handlers

One can write “Custom Save Handlers” as described by the Symfony2 documentation:

Custom handlers are those which completely replace PHP’s built in session save handlers by providing six callback functions which PHP calls internally at various points in the session workflow. Symfony2 HttpFoundation provides some by default and these can easily serve as examples if you wish to write your own. — Symfony2 documentation

But you should be careful, since the examples do not implement session locking.

LswMemcacheBundle to the rescue

At LeaseWeb we love (to use) Memcache. Therefore, we have built session locking into our LswMemcacheBundle. It actually implements acquiring a “spin lock” with the timeout set to PHP’s “max_execution_time” (defaults to 30 seconds). The spin lock tries to acquire the lock every 150 ms (configurable). It will also hold the lock for a maximum time of the PHP “max_execution_time”. By using Memcache’s built-in key expire mechanism, we can ensure the lock is not held indefinitely.

This (spin-lock) implementation is a port of the session locking code from the memcached PECL module (written in C). Our bundle enables locking by default. If you want, you can disable the locking by setting the “locking” configuration parameter to “false” as described in the documentation.

This session locking code was also ported to SncRedisBundle and submitted as PR #109. LswMemcacheBundle is open-source and can be found on our GitHub account:

https://github.com/LeaseWeb/LswMemcacheBundle

Share

POC: Flexible PHP Output Caching

I started my pet-project almost a year ago, developed it in my free time and it is time to write an article about it. I also released this project under Apache 2.0 license.(http://github.com/tothimre/POC)

Last year at the Symfony conference in Paris I have heard a really good quote:

“There are only two hard things in Computer Science: cache invalidation and naming things” — Phil Karlton. I agree with it and it gave me a boost to keep evolving the concept.

Addressed Audience

This article is for software developers and people who can tweak code. It is also for people who had or will have performance issues. This software will help you to solve these issues with minimal effort.

I don’t want you to get bored, so I will lead you to the examples, if you want to know more technical details you can find it after the following topic.

Examples

Unfortunately I cannot cover all the features framework provides in detail here but I can give you a brief introduction to their usage through a few examples. Let’s take the first functional tests in the framework as the first example. Please be aware though that this is a quickly evolving project and the examples seen/given here might not work with these parameters in the near future as the parameters are also subject to change. So please always refer to the documentation and examples shipped with the framework.

The functional tests do not cover all the implemented features but can be used to prove that the main functionality (caching) is working on the webserver. The tests can be found in the root of the project at the “/functionaltests” folder. The following example code does not contain inline comments because I am going to explain the code in detail.

Let’s have a look at cache_filecache.php


use POC\Poc;
use POC\cache\cacheimplementation\FileCache;

include("../framework/autoload.php");

$poc  = new Poc(array(Poc::PARAM_CACHE => new FileCache(), Poc::PARAM_DEBUG => true));

$poc->start();

include('lib/text_generator.php');

The first thing you notice looking at the code snippet above is the way parameters are given to the constructor of the Poc class. Because many parameters can be changed in the classes shipped with the project, I decided to build a more flexible way of handling parameters than the one used in PHP. The user has to pass an array to these objects where the index of the array is the name of the parameter and the value of course is the value of the parameter. If a parameter is not defined in the array the framework will use the builtin default value for that parameter. As the default cache engine is FileCache we could have omitted the PARAM_CACHE parameter in the previous example and define the $poc variable like this:

$poc  = new Poc(array(Poc::PARAM_DEBUG => true));

This is a really easy scenario where your application is mocked by the “/lib/text_generator.php” file. This is at the moment a Lorem Ipsum generator. We cache its contents – simply by creating the Poc object – as we can see in the example. We store the generated caches for 5 seconds – it is the default value – and we also want to see some performance information at the end of the cached text so we turned on debugging. We achieve this by adding the last parameter with a “true” value. The Hasher class is an important part of the concept. Let me describe it in the following example.

In this example we used the FileCache engine for caching, but by changing only a few characters we can use  “MemcachedCache”, “RediskaCache”, “MongoCache”, etc. So, it is really easy to implement new caching engines to the project.

More complex Example

Let’s get a closer look at an everyday example. That’s why I took the Symfony Jobeet tutorial for describing the usage more closely. I have copied my framework to the lib/vendors folder and also created the poc.php file in the config folder. This file needs to be included at the very beginning of the application. For now we put it at the first line of the file:  config/ProjectConfiguration.class.php

Let me reveal to you the contents of the file poc.php.

require(dirname(__FILE__).'/../lib/vendor/poc/framework/autoload.php');
use POC\cache\filtering\Hasher;
use POC\cache\filtering\Filter;
use POC\Poc;
use POC\cache\cacheimplementation\FileCache;

$hasher = new Hasher();
$hasher->addDistinguishVariable($_GET);
$hasher->addDistinguishVariable($_SERVER["REQUEST_URI"]);

$conditions = new Filter();
$conditions->addBlackListCondition($_POST);
$conditions->addBlackListCondition(strpos($_SERVER['REQUEST_URI'],'backend'));
$conditions->addBlackListCondition(strpos($_SERVER['REQUEST_URI'],'job'));
$conditions->addBlackListCondition(strpos($_SERVER['REQUEST_URI'],'affiliate'));

$cache = new FileCache(array(FileCache::PARAM_TTL=>300,
FileCache::PARAM_FILTER=>$conditions,
FileCache::PARAM_HASHER=>$hasher));

$poc  = new Poc(array(Poc::PARAM_CACHE=>$cache, Poc::PARAM_DEBUG=>true));
$poc->start();

As you can see, the basics of this case are really similar to the previous example. But here we utilize some other features as well.

By calling the addDistinguishVariable function we will make our cache unique. Those variables get stored into an array that you add to it as a parameter, then at one point the array will be serialized and a hashkey will be generated from it. This hash will identify your cache. You have to find the variables that can identify the state of your application and add those to this function, it is that simple! As I examined the Jobeet application the $_GET, and the Request URI values define the state of the application on the level we want to use the cache. This means that authenticated pages are not cached in this case.

now we have reached the next piece of code that calls the addBlacklistCondition function. This stores logical values regarding the current state of your application. If any of those are true the page will not be involved in the caching process. Here we have defined if there is a POST action or if the URL contains any of the “backend”,” job”, “affiliate” words the caching is not applied.

Easy, right? There is no need for more explanation; it just works out of box. Maybe I have not covered all possible blacklist states in this example, but you know the basic software.

Performance

The Engine is really small; it does not contain more than 2400 lines of code (excluding unittests and vendors), and it does not do any “black magic”. The total overhead should be less than one millisecond. If there is a cache hit, it is likely to get the output to your machine within a few milliseconds. If you measure the process before the moment the output is pushed to the client, you can see that the cache engine in most cases gets you the page under 1 millisecond. (Note that the actual performance might depend on the cache engine you employ and the environment you run the software on.)The performance is really promising, way much faster as I thought it will be when I started the project, but for now what I can say is that in a next article you will see proper benchmarks as well, so stay tuned!

Constraints and concepts

I wanted to use the latest programming methods so I decided to support only PHP 5.3 and above. Some of the several concepts and methodologies the project uses are the following:

  • namespaces
  • Continuous integration (Jenkins)
  • Dependency injection

Of course it had to be fast so I didn’t want to rely on external frameworks but used the built in PHP functionality where it was possible.

Features

With this framework you can customize the caching settings to a great extent. Let me list some of these options:

  • Output caching based on user defined criteria
  • Cache invalidation by TTL
  • Blacklisting / cache invalidation by application state
  • Blacklisting by output content
  • For caching it utilizes many interfaces, such as:
    • Memcached
    • Redis
    • MongoDb
    • Its own filesystem based engine.
    • APC (experimental, performs and works well on a webserver, but unfortunately the CLI interface does not behave like it should and it cannot be unit tested properly so I don’t include it in the master branch)
  • For cache tagging it utilizes MySQL but more database engines will be added
  • Cache Invalidation by tags
  • Minimal overhead on the performance
  • Easy to turn on/off
  • Controls the headers

Planned features

As the framework is still in an early state, many new features will be implemented in the future. These include the following:

  • Edge side includes
  • Cache templating with Twig
  • statistics stored in database
  • And many more

If you have any questions and or suggestions, feel free to leave a comment!

Share