Using Correlation IDs in API Calls

Over the years, the IT industry has moved from a single domain, monolithic architecture to a microservice architecture. In a microservice architecture, complex processes are split into smaller and simpler sub-processes. While this kind of architecture has many benefits, there are also some downsides – for example, if you send one request to a Leaseweb API, it ends up in multiple requests in other backend systems [FIGURE 1]. How do you keep track of requests and responses processed by multiple systems? This is where Correlation IDs come into play.

[FIGURE 1: Example request/response flow]

Using a Correlation ID

A Correlation ID is a unique, randomly generated identifier value that is added to every request and response. In a microservice architecture, the initial Correlation ID is passed to your sub-processes. If a sub-system also makes sub-requests, it will also pass the Correlation ID to those systems.

How you pass the Correlation ID to other systems depends on your architecture. At Leaseweb we are using REST APIs a lot, with HTTP headers to pass on the Correlation ID. As a rule, we assign a Correlation ID as soon as possible, and always use a Correlation ID if it is passed on. Our public API only accepts Correlation IDs from internally trusted clients. For any other client (such as an employee or customer API clients) a new Correlation ID is generated for the request.

Real Value of Correlation IDs

The real value of Correlation IDs is realized when you also log the Correlation IDs. Debugging or tracing requests becomes much easier, as you can search all of your logs for the same Correlation ID. Combined with central logging solutions (such as the ELK stack), searching logs becomes even easier and can be done by non-technical colleagues. Providing tools to your colleagues to troubleshoot issues allows them to have more responsibility and gives you more time to work on more technical projects.

We mainly use Correlation IDs at Leaseweb for debugging purposes. When an error occurs, we provide the Correlation ID to the client/customer. If users provide the Correlation ID when submitting a support ticket, we can visualize the entire process needed to fulfil the client’s initial intent. This has significantly improved the time it takes us to fix bugs.

[FIGURE 2: Example of one Correlation ID with multiple requests]

Debugging issues is a time-consuming process if Correlation IDs are not used. When your environment scales, you will need to find solutions to group transactions happening in your systems. By using a Correlation ID, you can easily group requests and events in your systems, allowing you to spend more time fixing the problem and less time trying to find it.

Practical examples on how to implement Correlation IDs

The following examples use Symfony, a popular web application framework. These concepts can also be applied to any other framework, such as Laravel, Django, Flask or Ruby on Rails.

If you are unfamiliar with the concept of Service Containers and Dependency Injection, we recommend reading the excellent Symfony documentation about it here: https://symfony.com/doc/current/service_container.html

Using Monolog to append Correlation IDs to your application logs

When processing a HTTP request your application often logs some information – such as when an error occurred, or an important change made in your system that you want to keep track of. When using the Monolog logging library in PHP (https://seldaek.github.io/monolog/), you can use the concept of “Processors” (read more about that here on symfony.com).

One way to do this is by creating a Monolog Processor class:

<?php

namespace App\Monolog\Processor;

use Symfony\Component\HttpFoundation\RequestStack;

class CorrelationIdProcessor
{
    protected $requestStack;

    public function __construct(RequestStack $requestStack)
    { 

       $this->requestStack = $requestStack;

    }
 
    public function processRecord(array $record)
    {
        $request = $this->requestStack->getCurrentRequest();

        if (!$request) {
            return;
        }

        $correlationId = $request->headers->get(‘X-My-Correlation-ID');

        if (empty($correlationId)) {
             return;
        }

        // If we have a correlation id include it in every monolog line
        $record['extra']['correlation_id'] = $correlationId;
 
        return $record;
    }
}

Then register this class on the service container as a monolog processor in services.yml:

# app/config/services.yml

services:
  App\Monolog\Processor\CorrelationIdProcessor:
    arguments: ["@request_stack"]
    tags:
      - name: monolog.processor
        method: processRecord

Now, every time you log something in your application with Monolog:

$this->logger->info('shopping_cart_emptied', [‘cart_id’ => 123]);

You will see the Correlation ID of the HTTP Request in your log files:

$ grep ‘shopping_cart_emptied’ var/logs/prod.log

[2020-07-03 12:14:45] app.INFO: shopping_cart_emptied {“cart_id”: 123} {"correlation_id":"d135d5f1-3dd0-45fa-8f26-55d8d6a44876"}

You can utilize the same pattern to log the name of the user that is currently logged in, the remote IP address of the API client, or anything else that makes troubleshooting faster for you.

Using Guzzle to append Correlation IDs when making sub-requests

If your API makes API calls to other microservices (and you use Guzzle to do this) you can make use of Handlers and Middleware.

Some teams at Leaseweb depend on many downstream microservices, and can therefore have multiple guzzle clients as services on the service container. While each Guzzle client is configured with its own base URL and/or authentication, it is possible for all of the Guzzle clients to share the same HandlerStack.

First, create the middleware:

<?php

namespace App\Guzzle\Middleware;

use Symfony\Component\HttpFoundation\RequestStack;
use Psr\Http\Message\RequestInterface;

class CorrelationIdMiddleware
{
    protected $requestStack;
 
    public function __construct(RequestStack $requestStack)
    {
        $this->requestStack = $requestStack;
    }

    public function __invoke(callable $handler)
    {
        return function (RequestInterface $request, array $options = []) use ($handler) {
            $request = $this->requestStack->getCurrentRequest();

            if (!$request) {
                return $handler($request, $options);
            }

            $correlationId = $request->headers->get(‘X-My-Correlation-ID');

            if (empty($correlationId)) {
                 return $handler($request, $options);
            } 
 
            $request = $request->withHeader(‘X-My-Correlation-ID’, $correlationId);
 
            return $handler($request, $options);
        };
    }
}

Define this middleware as service on the service container and create a HandlerStack:

# app/config/services.yml

services:
  correlation_id_middleware:
    class: App\Guzzle\Middleware:
    arguments: ["@request_stack"]

  correlation_id_handler_stack:
    class: GuzzleHttp\HandlerStack
    factory: ['GuzzleHttp\HandlerStack', 'create']
    calls:
      - [push, ["@correlation_id_middleware", "correlation_id_forwarder"]]

With these two services defined, you can now configure all your Guzzle clients using the HandlerStack so that the Correlation ID of the current HTTP request is forwarded to downstream HTTP requests:

# app/config/services.yml

services:
  my_downstream_api:
    class:
    arguments:
      - base_uri: https://my-downstream-api.example.com
        handler: "@correlation_id_handler_stack”

Now every API call that you make to https://my-downstream-api.example.com will include the HTTP request header ‘X-My-Correlation-ID’ and have the same value as the Correlation ID of the current HTTP request. You can also apply the same Monolog and Guzzle tricks described here to the downstream API.

Expose Correlation IDs in error responses

The missing link between these processes is to now expose your Correlation IDs to your users so they can also log them or use them in support cases they report to your organization.

Symfony makes this easy using Event Listeners. You can define Event Listeners in Symfony to pre-process HTTP requests as well as to post-process HTTP Responses just before they are returned by Symfony to the API caller. In this example, we will create a HTTP Response listener and add the Correlation ID of the current HTTP request as a HTTP Header in the HTTP Response.

First, we create a service on the Service Container:

<?php
 
namespace App\Listener;
 
use Symfony\Component\HttpFoundation\RequestStack;
use Symfony\Component\HttpKernel\Event\FilterResponseEvent;

class CorrelationIdResponseListener
{
    protected $requestStack;
 
    public function __construct(RequestStack $requestStack)
    {
        $this->requestStack = $requestStack;
    }

    public function onKernelResponse(FilterResponseEvent $event)
    {
        $request = $this->requestStack->getCurrentRequest();

        if (!$request) {
            return;
        }

        $correlationId = $request->headers->get(‘X-My-Correlation-ID');

        if (empty($correlationId)) {
             return;
        }

        $event->getResponse()->headers->set(‘X-My-Correlation-ID’, $correlationId);
    }
}

Now configure it as a Symfony Event Listener:

# app/config/services.yml

services:
  correlation_id_response_listener:
    class: App\Listener\CorrelationIdResponseListener
    arguments: ["@request_stack"]
    tags:
      - { name: kernel.event_listener, event: kernel.response, method: onKernelResponse }

Every response that is generated by your Symfony application will now include a X-My-Correlation-ID HTTP response header with the same Correlation ID as the HTTP request.

The Value of Correlation IDs

Using Correlation IDs throughout your whole stack gives you more insight into all (sub)requests during a transaction. Using the right tools allows others to debug issues, giving your developers more time to work on new awesome features.

Implementing Correlation IDs isn’t hard to do, and can be achieved quickly depending on your software stack. At Leaseweb, the use of Correlation IDs has saved us hours of time while debugging issues on numerous occasions.

Technical Careers at Leaseweb

We are searching for the next generation of engineers and developers to help us build infrastructure to automate our global hosting services! If you are interested in finding out more, check out our Careers at Leaseweb.

Share

PHP-CRUD-API now supports authorization and validation

Another milestone is reached for the PHP-CRUD-API project. A project that aims to provide a high performance, consistent data API over REST that is easy to deploy (it is a single PHP file!) and requires minimal configuration. By popular demand we have added four important new features:

  1. Tables and the actions on them can be restricted with custom rules.
  2. Access to specific columns can be restricted using your own algorithm.
  3. You can specify “sanitizers” to, for example, strip HTML tags from input.
  4. You can specify “validators” functions to show errors on invalid input.

These features are built by allowing you to define callback functions in your configuration. These functions can then contain your application specific logic. How these function work and how you can load them is explained below.

Table authorizer

The following function can be used to authorize access to specific tables:

/**
 * @param action    'create','read','update','delete','list'
 * @param database  name of your database (e.g. 'northwind')
 * @param table     name of the table (e.g. 'customers')
 * @returns bool    indicates that access is granted  
 **/
  
$f1=function($action,$database,$table){
  return true; 
};

Column authorizer

The following function can be used to authorize access to specific columns:

/**
 * @param action    'create','read','update','delete','list'
 * @param database  name of your database (e.g. 'northwind')
 * @param table     name of the table (e.g. 'customers')
 * @param column    name of the column (e.g. 'password')
 * @returns bool    indicates that access is granted  
 **/
  
$f2=function($action,$database,$table,$column){
  return true; 
};

Input sanitizer

The following function can be used to sanitize input for specific columns:

/**
 * @param action    'create','read','update','delete','list'
 * @param database  name of your database (e.g. 'northwind')
 * @param table     name of the table (e.g. 'customers')
 * @param column    name of the column (e.g. 'username')
 * @param type      type of the column (depends on engine)
 * @param value     input from the user (e.g. 'johndoe88')
 * @returns string  sanitized value
 **/
  
$f3=function($action,$database,$table,$column,$type,$value){
  return $value; 
};

Input validator

The following function can be used to validate input for specific columns:

/**
 * @param action    'create','read','update','delete','list'
 * @param database  name of your database (e.g. 'northwind')
 * @param table     name of the table (e.g. 'customers')
 * @param column    name of the column (e.g. 'username')
 * @param type      type of the column (depends on engine)
 * @param value     input from the user (e.g. 'johndoe88')
 * @param context   all input fields in this action
 * @returns string  validation error (if any) or null
 **/
  
$f4=function($action,$database,$table,$column,$type,$value,$context){
  return null;
};

Configuration

This is an example configuration that requires the above snippets to be defined.

$api = new MySQL_CRUD_API(array(
  'hostname'=>'localhost',
  'username'=>'xxx',
  'password'=>'xxx',
  'database'=>'xxx',
  'charset'=>'utf8',
  'table_authorizer'=>$f1,
  'column_authorizer'=>$f2,
  'input_sanitizer'=>$f3,
  'input_validator'=>$f4
));
$api->executeCommand();

You can find the project on Github.

Share

PHP script to tail a log file using telnet

tail

Why would you need a PHP script to tail a log file using telnet? You don’t! But it the script is cool anyway. It allows you to connect to your web server over telnet, talk some HTTP to your web server, and run a PHP script that shows a tail of a log file. It uses ANSI sequences (colors!) to provide a nice user interface specifically to tail a log file with the “follow” option (like “tail -f”). Below you find the PHP script that you have to put on the web server:

<?php
// configuration
$file = '/var/log/apache2/access.log';
$ip = '127.';
// start of script
$title = "\033[H\033[2K$file";
if (strpos($_SERVER['REMOTE_ADDR'],$ip)!==0) die('Access Denied');
$stream = fopen($file, 'r');
if (!$stream) die("Could not open file: $file\n");
echo "\033[m\033[2J";
fseek($stream, 0, SEEK_END);
echo str_repeat("\n",4500)."\033[s$title";
flush();
while(true){
  $data = stream_get_contents($stream);
  if ($data) {
    echo "\033[32m\033[u".$data."\033[s".str_repeat("\033[m",1500)."$title";
    flush();
  }
  usleep(100000);
}
fclose($stream);

To tail (and follow) a remote file you need to talk HTTP to the web server using telnet and request to load the PHP tail script. First you connect using telnet:

$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

After connecting you have to “speak” some HTTP (just type this):

GET /tail.php HTTP/1.1
Host: localhost

NB: Make sure you end the above telnet commands with an empty line! After this the screen should be empty showing any new log lines in real-time in green on the telnet window.

You can use Ctrl + ‘]’ to get to the telnet prompt and type “quit” to exit.

If you don’t want to copy the code above, then you can also find the latest version of tail.php on Github.

Share

Static code analysis for PHP templates

Templating is cool. Everybody is using Twig today. Other popular choices are: Smarty, Mustache and Latte. You may also want to read what Fabien Potencier has written about PHP templates languages. It makes sense.

Still I can think of two reasons why we don’t want a templating language and we rather use PHP itself for templating. First reason: PHP templating is easier to learn than a PHP templating language.  Second reason: it executes faster.

PHP templating languages improve security

I tried to understand what the primary reason is that people are using a templating language. It seems to be ease of use, while keeping the application secure. The following example shows how easily you can write unsafe code:

Hello <?php echo $POST['name']; ?>!

It would only be safe to print a POST variable when using:

<?php echo htmlspecialchars($POST['name'],ENT_QUOTES,'UTF-8'); ?>

A templating language typically allows you to write something like:

Hello {{ name }}!

I agree that security is improved by using a templating language. The templating language escapes the output strings in order to prevent XSS vulnerabilities. But still I wonder: Can’t we get the same security benefits when we use native PHP for templating?

Helper function

As you have seen the PHP way of escaping is rather long. Fortunately, you can easily define a function that allows an alternative syntax, for instance:

Hello <?php e($POST['name']); ?>!

Yup, that is the “e” for “echo” :-). Now we can report all native (unescaped) echo function calls as being potentially unsafe. This can be achieved by doing static code analysis. While analyzing the code the analyzer could complain like this:

PHP Warning:  In "template.php" you should not use "echo" on line 1. Error raised  in analyzer.php on line 11

This could be limited to debug mode as static code analysis actually takes some time and may harm the performance of your application.

Static code analysis in PHP

I worked out the idea of secure PHP templating using static code analysis. In development (debug) mode it should warn the programmer when he uses a potentially non-safe construct.

The following analyzer script shows how this works:

<?php
$tokens    = array('T_ECHO', 'T_PRINT', 'T_EXIT', 'T_STRING', 'T_EVAL', 'T_OPEN_TAG_WITH_ECHO');
$functions = array('echo', 'print', 'die', 'exit', 'var_dump', 'eval', '<?=');
$filename  = 'template.php';

$all_tokens = token_get_all(file_get_contents($filename));
foreach ($all_tokens as $token) {
  if (is_array($token)) {
    if (in_array(token_name($token[0]),$tokens)) {
      if (in_array($token[1],$functions)) {
        trigger_error('In "'.$filename.'" you should not use "'.htmlentities($token[1]).'" on line '.$token[2].'. Error raised ', E_USER_WARNING);
      }
    }
  }
}

It will analyze the “template.php” file and report potentially insecure or erroneous language constructs.

This form of templating and static code analysis is fully implemented in the MindaPHP framework that you can find on my Github account. You can find the source code of the PHP static code analyzer class here.

Share

Jeff Atwood: Learning to code is NOT overrated

Jeff Atwood, author of the Coding Horror blog and creator of the Q&A site StackOverflow, wrote an article with the provocative title:

 Learning to code is overrated

He argues that we should not teach our children software development, because there are better, greater skills that we should teach our children. He seems not to realize how much of his own perspective is involved in this opinion.

One of the great achievements of modern computing is that we no longer need to be programmers to create, build and get things done

Oh, and let’s just pretend he did not write that and we did not read that.

Language vs. Mathematics

So the main reasoning in the article is this:

There’s nothing wrong with basic exposure to computer science. But it should not come at the expense of fundamental skills such as reading, writing and mathematics.

Sounds reasonable, but wait, what?! We should not teach our children (computer) science, because they might get behind on reading, writing and math? I would argue the opposite about this fantastic applied science: We should learn children to create software in order to make them want to learn English (reading and writing) and master mathematics.

I know that I was totally unmotivated at primary school for reading, writing and mathematics. On the other hand I was highly motivated to learn whatever was needed to understand IBM-Basic programming language. This challenge showed me why it was important to pay attention at school.

When Jeff Atwood says that reading and writing are more important he may also be referring to “Literature vs. Computer Science” or more generally “Language vs. Mathematics”. I do not see how you could argue that one is more important than the other. I do on the other hand recognize that people tend to lean more towards one than the other.

Programmers can’t communicate

Jeff Atwood seems to imply that the world solely consists of “Mathematical” people, when he writes:

Learning to talk to the computer is the easiest part. […] The people — well . . . you’ll spend the rest of your life figuring that out.

I think this is where Jeff is blinded by his own perspective. Children on a primary school are not guaranteed to think talking to the computer is easy and talking to people is hard. Only the people that do not need encouragement to go into computer science are like that. For most of the people the opposite is true. These people are discouraged to enter the profession, due to the lack of diversity.

… typing in pedantic command words in a programming environment

This is how Jeff Atwood describes programming. I guess somebody convinced him that programming is an inferior activity and not the creative and intellectually challenging craft that it actually is. It makes me wonder who in Jeff Atwood’s life is responsible for destroying the ability to enjoy the magic of creating software that I cherish so much. I promise that I won’t let this happen to me.

TL;DR

Jeff Atwood wrote an article to criticize the mayor de Blasio plan to teach programming to kids. His main argument seems to be that he values language skills over applied mathematics. But he seems to forget that we are not all mathematical minds that can easily learn programming. Learning kids to code may improve the popularity of the profession and hopefully also the diversity.

Share