Using Correlation IDs in API Calls

Over the years, the IT industry has moved from a single domain, monolithic architecture to a microservice architecture. In a microservice architecture, complex processes are split into smaller and simpler sub-processes. While this kind of architecture has many benefits, there are also some downsides – for example, if you send one request to a , it ends up in multiple requests in other backend systems [FIGURE 1]. How do you keep track of requests and responses processed by multiple systems? This is where Correlation IDs come into play.

[FIGURE 1: Example request/response flow]

Using a Correlation ID

A Correlation ID is a unique, randomly generated identifier value that is added to every request and response. In a microservice architecture, the initial Correlation ID is passed to your sub-processes. If a sub-system also makes sub-requests, it will also pass the Correlation ID to those systems.

How you pass the Correlation ID to other systems depends on your architecture. At Leaseweb we are using REST APIs a lot, with HTTP headers to pass on the Correlation ID. As a rule, we assign a Correlation ID as soon as possible, and always use a Correlation ID if it is passed on. Our public API only accepts Correlation IDs from internally trusted clients. For any other client (such as an employee or customer API clients) a new Correlation ID is generated for the request.

Real Value of Correlation IDs

The real value of Correlation IDs is realized when you also log the Correlation IDs. Debugging or tracing requests becomes much easier, as you can search all of your logs for the same Correlation ID. Combined with central logging solutions (such as the ELK stack), searching logs becomes even easier and can be done by non-technical colleagues. Providing tools to your colleagues to troubleshoot issues allows them to have more responsibility and gives you more time to work on more technical projects.

We mainly use Correlation IDs at Leaseweb for debugging purposes. When an error occurs, we provide the Correlation ID to the client/customer. If users provide the Correlation ID when submitting a support ticket, we can visualize the entire process needed to fulfil the client’s initial intent. This has significantly improved the time it takes us to fix bugs.

[FIGURE 2: Example of one Correlation ID with multiple requests]

Debugging issues is a time-consuming process if Correlation IDs are not used. When your environment scales, you will need to find solutions to group transactions happening in your systems. By using a Correlation ID, you can easily group requests and events in your systems, allowing you to spend more time fixing the problem and less time trying to find it.

Practical examples on how to implement Correlation IDs

The following examples use Symfony, a popular web application framework. These concepts can also be applied to any other framework, such as Laravel, Django, Flask or Ruby on Rails.

If you are unfamiliar with the concept of Service Containers and Dependency Injection, we recommend reading the excellent Symfony documentation about it here: https://symfony.com/doc/current/service_container.html

Using Monolog to append Correlation IDs to your application logs

When processing a HTTP request your application often logs some information – such as when an error occurred, or an important change made in your system that you want to keep track of. When using the Monolog logging library in PHP (https://seldaek.github.io/monolog/), you can use the concept of “Processors” (read more about that here on symfony.com).

One way to do this is by creating a Monolog Processor class:

<?php

namespace App\Monolog\Processor;

use Symfony\Component\HttpFoundation\RequestStack;

class CorrelationIdProcessor
{
    protected $requestStack;

    public function __construct(RequestStack $requestStack)
    { 

       $this->requestStack = $requestStack;

    }
 
    public function processRecord(array $record)
    {
        $request = $this->requestStack->getCurrentRequest();

        if (!$request) {
            return;
        }

        $correlationId = $request->headers->get(‘X-My-Correlation-ID');

        if (empty($correlationId)) {
             return;
        }

        // If we have a correlation id include it in every monolog line
        $record['extra']['correlation_id'] = $correlationId;
 
        return $record;
    }
}

Then register this class on the service container as a monolog processor in services.yml:

# app/config/services.yml

services:
  App\Monolog\Processor\CorrelationIdProcessor:
    arguments: ["@request_stack"]
    tags:
      - name: monolog.processor
        method: processRecord

Now, every time you log something in your application with Monolog:

$this->logger->info('shopping_cart_emptied', [‘cart_id’ => 123]);

You will see the Correlation ID of the HTTP Request in your log files:

$ grep ‘shopping_cart_emptied’ var/logs/prod.log

[2020-07-03 12:14:45] app.INFO: shopping_cart_emptied {“cart_id”: 123} {"correlation_id":"d135d5f1-3dd0-45fa-8f26-55d8d6a44876"}

You can utilize the same pattern to log the name of the user that is currently logged in, the remote IP address of the API client, or anything else that makes troubleshooting faster for you.

Using Guzzle to append Correlation IDs when making sub-requests

If your API makes API calls to other microservices (and you use Guzzle to do this) you can make use of Handlers and Middleware.

Some teams at Leaseweb depend on many downstream microservices, and can therefore have multiple guzzle clients as services on the service container. While each Guzzle client is configured with its own base URL and/or authentication, it is possible for all of the Guzzle clients to share the same HandlerStack.

First, create the middleware:

<?php

namespace App\Guzzle\Middleware;

use Symfony\Component\HttpFoundation\RequestStack;
use Psr\Http\Message\RequestInterface;

class CorrelationIdMiddleware
{
    protected $requestStack;
 
    public function __construct(RequestStack $requestStack)
    {
        $this->requestStack = $requestStack;
    }

    public function __invoke(callable $handler)
    {
        return function (RequestInterface $request, array $options = []) use ($handler) {
            $request = $this->requestStack->getCurrentRequest();

            if (!$request) {
                return $handler($request, $options);
            }

            $correlationId = $request->headers->get(‘X-My-Correlation-ID');

            if (empty($correlationId)) {
                 return $handler($request, $options);
            } 
 
            $request = $request->withHeader(‘X-My-Correlation-ID’, $correlationId);
 
            return $handler($request, $options);
        };
    }
}

Define this middleware as service on the service container and create a HandlerStack:

# app/config/services.yml

services:
  correlation_id_middleware:
    class: App\Guzzle\Middleware:
    arguments: ["@request_stack"]

  correlation_id_handler_stack:
    class: GuzzleHttp\HandlerStack
    factory: ['GuzzleHttp\HandlerStack', 'create']
    calls:
      - [push, ["@correlation_id_middleware", "correlation_id_forwarder"]]

With these two services defined, you can now configure all your Guzzle clients using the HandlerStack so that the Correlation ID of the current HTTP request is forwarded to downstream HTTP requests:

# app/config/services.yml

services:
  my_downstream_api:
    class:
    arguments:
      - base_uri: https://my-downstream-api.example.com
        handler: "@correlation_id_handler_stack”

Now every API call that you make to https://my-downstream-api.example.com will include the HTTP request header ‘X-My-Correlation-ID’ and have the same value as the Correlation ID of the current HTTP request. You can also apply the same Monolog and Guzzle tricks described here to the downstream API.

Expose Correlation IDs in error responses

The missing link between these processes is to now expose your Correlation IDs to your users so they can also log them or use them in support cases they report to your organization.

Symfony makes this easy using Event Listeners. You can define Event Listeners in Symfony to pre-process HTTP requests as well as to post-process HTTP Responses just before they are returned by Symfony to the API caller. In this example, we will create a HTTP Response listener and add the Correlation ID of the current HTTP request as a HTTP Header in the HTTP Response.

First, we create a service on the Service Container:

<?php
 
namespace App\Listener;
 
use Symfony\Component\HttpFoundation\RequestStack;
use Symfony\Component\HttpKernel\Event\FilterResponseEvent;

class CorrelationIdResponseListener
{
    protected $requestStack;
 
    public function __construct(RequestStack $requestStack)
    {
        $this->requestStack = $requestStack;
    }

    public function onKernelResponse(FilterResponseEvent $event)
    {
        $request = $this->requestStack->getCurrentRequest();

        if (!$request) {
            return;
        }

        $correlationId = $request->headers->get(‘X-My-Correlation-ID');

        if (empty($correlationId)) {
             return;
        }

        $event->getResponse()->headers->set(‘X-My-Correlation-ID’, $correlationId);
    }
}

Now configure it as a Symfony Event Listener:

# app/config/services.yml

services:
  correlation_id_response_listener:
    class: App\Listener\CorrelationIdResponseListener
    arguments: ["@request_stack"]
    tags:
      - { name: kernel.event_listener, event: kernel.response, method: onKernelResponse }

Every response that is generated by your Symfony application will now include a X-My-Correlation-ID HTTP response header with the same Correlation ID as the HTTP request.

The Value of Correlation IDs

Using Correlation IDs throughout your whole stack gives you more insight into all (sub)requests during a transaction. Using the right tools allows others to debug issues, giving your developers more time to work on new awesome features.

Implementing Correlation IDs isn’t hard to do, and can be achieved quickly depending on your software stack. At Leaseweb, the use of Correlation IDs has saved us hours of time while debugging issues on numerous occasions.

Technical Careers at Leaseweb

We are searching for the next generation of engineers and developers to help us build infrastructure to automate our global hosting services! If you are interested in finding out more, check out our Careers at Leaseweb.

Share

Why speed is more important than scalability

Software developers creating web applications like to talk about scalability as if it is totally unrelated to computing efficiency. They also like to argue that abstractions are important. They will tell you that their DBAL, Router, View Egine, Dependency Injection and even ORM do not have that much overhead (only a little). They will argue that the learning curve pays off and that the performance loss is not that bad. In this post I’ll argue that page loading speed is more important than scalability (or pretty abstractions).

Orders of magnitude of speed on the web

Just to get an idea of speed, I tried to search for a web action for every order of magnitude:

  • 0.1 ms – A simple database lookup from memory
  • 1 ms – Serving small static content from RAM
  • 10 ms – An very simple API that only does a DB lookup
  • 100 ms – A complex page load that calls multiple APIs
  • 1000 ms – Nothing should take this long… 🙂

But I’m sure that your website has pages that take a full second or more (even this site has). How can that be?

CPUs and web farms

Most severs have 1 to 4 CPUs. Each CPU has 2 to 32 cores. If you do a single web request, then you are using (at most) a single core of a single CPU. Maybe that is why people say that page loading speed is irrelevant. If you have more visitors, then they will use other cores or even other CPUs. This is true, but what if you have more concurrent requests than visitors? You can simply add machines and configure a web farm, as most people do.

At some point you may have 16 servers running your popular web application with page loads that average at 300 ms. If you could bring down the page load time to 20 ms, you could run this on a single box! Imagine how much simpler that is! The “one big box” strategy is also called the “big iron” strategy. Often programmers are not very careful with resources. This is because a software developers tend to aim for beautiful abstractions and not for fast software.

Programming languages matter

Only when hardware enthusiasts and software developers work together you may get efficient software. Nice frameworks may have to be removed. Toys like Dependency Injection that mainly bring theoretical value may have to be sacrificed. Also languages need to be chosen for execution speed. Languages that typically score good are: C, C# (Mono), Go and even Java (OpenJDK). Languages that typically score very bad: PHP, Python, Ruby and Perl (source: benchmarksgame). This is understandable, as these are all interpreted languages. But it should be considered when building a web application.

Stop using web application frameworks

But even if you use an interpreted language (which may be 10x slower), then you can still have good performance. Unless of course you build applications consisting of tens of thousands of lines of code that all need to be loaded. Normally people would not do that as it would take too much time to write. (warning: sarcasm) In order to be able to fail – against all odds – people have created frameworks. These are tens of thousands of lines of code that you can install on your server. With it your small and lean application will still be slow (/sarcasm). I think it is fair to say that frameworks will make your web application typically 10-100x slower.

Now let that be exactly the approach that is seen in the industry as “best practice”. Unbelievable right?

5 reasons your application is slow

If you are on shared hardware (VM or shared webhosting), then you need to fix that first. I’m sure switching to dedicated hardware will give you a better (and more consistent) performance pattern. Still, each of the following problems may give you an order of a magnitude of speed decrease:

  1. Not enough RAM
  2. No SSD disks (or wrong controllers)
  3. Using an interpreted programming language
  4. A bloated web framework
  5. Not using Memcache where possible

How many of these apply to you? Let me know in the comments.

Finally

This post will probably be considered offending by programmers that like VMs, frameworks and interpreted languages. Please, don’t use that negative energy in the comments. Use it to “Go” and try Gorilla, I dare you! It is not a framework and it is fast, very fast! It will not only going to be interesting and a lot of fun, it may also change your mind about this article.

NB: Go is almost as fast as the highly optimized C code of Nginx (about 10-100x faster than PHP).

Share

The lies about the necessity of a Big Rewrite

This post is about real world software products that make money and have required multiple man-years of development time to build. This is an industry in which quality, costs and thus professional software development matters. The pragmatism and realism of Joel Spolsky’s blog on this type of software development is refreshing. He is also not afraid to speak up when he believes he is right, like on the “Big Rewrite” subject. In this post, I will argue why Joel Spolsky is right. Next I will show the real reasons software developers want to do a Big Rewrite and how they justify it. Finally, Neil Gunton has a quote that will help you convince software developers that there is a different path that should be taken.

Big Rewrite: The worst strategic mistake?

Whenever a developer says a software product needs a complete rewrite, I always think of Joel Spolsky saying:

… the single worst strategic mistake that any software company can make: (They decided to) rewrite the code from scratch. – Joel Spolsky

You should definitely read the complete article, because it holds a lot of strong arguments to back the statement up, which I will not repeat here. He made this statement in the context of the Big Rewrite Netscape did that led to Mozilla Firefox. In an interesting very well written counter-post Adam Turoff writes:

Joel Spolsky is arguing that the Great Mozilla rewrite was a horrible decision in the short term, while Adam Wiggins is arguing that the same project was a wild success in the long term. Note that these positions do not contradict each other.

Indeed! I fully agree that these positions do not contradict. So the result was not bad, but this was the worst mistake the software company could make. Then he continues to say that:

Joel’s logic has got more holes in it than a fishing net. If you’re dealing with a big here and a long now, whatever work you do right now is completely inconsequential compared to where the project will be five years from today or five million users from now. – Adam Turoff

Wait, what? Now he chooses Netscape’s side?! And this argument makes absolutely no sense to me. Who knows what the software will require five years or five million users from now? For this to be true, the guys at Netscape must have been able to look into the future. If so, then why did they not buy Apple stock? In my opinion the observation that one cannot predict the future is enough reason to argue that deciding to do a Big Rewrite is always a mistake.

But what if you don’t want to make a leap into the future, but you are trying to catch up? What if your software product has gathered so much technical debt that a Big Rewrite is necessary? While this argument also feels logical, I will argue why it is not. Let us look at the different technical causes of technical debt and what should be done to counter them:

  • Lack of test suite, which can easily be countered by adding tests
  • Lack of documentation, writing it is not a popular task, but it can be done
  • Lack of building loosely coupled components, dependency injection can be introduced one software component at a time; your test suite will guarantee there is no regression
  • Parallel development, do not rewrite big pieces of code, keep the change sets small and merge often
  • Delayed refactoring, is refactoring much more expensive than rewriting? It may seem so due to the 80/20 rule, but it probably is not; just start doing it

And then we immediately get back to the reality, which normally prevents us from doing a big rewrite – we need to tend the shop. We need to keep the current software from breaking down and we need to implement critical bug fixes and features. If this takes up all our time, because there is so much technical debt, then that debt may become a hurdle that seems too big to overcome ever. So realize that not being able to reserve time (or people) to get rid of technical debt can be the real reason to ask for a Big Rewrite.

To conclude: a Big Rewrite is always a mistake, since we cannot look into the future and if there is technical debt then that should be acknowledged and countered the normal way.

The lies to justify a Big Rewrite

When a developer suggests a “complete rewrite” this should be a red flag to you. The developer is most probably lying about the justification. The real reasons the developer is suggesting Big Rewrite or “build from scratch” are:

  1. Not-Invented-Here syndrome (not understanding the software)
  2. Hard-to-solve bugs (which are not fun working on)
  3. Technical debt, including debt caused by missing tests and documentation (which are not fun working on)
  4. The developer wants to work on a different technology (which is more fun working on)

The lie is that the bugs and technical debt are presented as structural/fundamental changes to the software that cannot realistically be achieved without a Big Rewrite. Five other typical lies (according to Chad Fowler) that the developer will promise in return of a Big Rewrite include:

  1. The system will be more maintainable (less bugs)
  2. It will be easier to add features (more development speed)
  3. The system will be more scalable (lower computation time)
  4. System response time will improve for our customers (less on-demand computation)
  5. We will have greater uptime (better high availability strategy)

Any code can be replaced incrementally and all code must be replaced incrementally. Just like bugs need to be solved and technical debt needs to be removed. Even when technology migrations are needed, they need to be done incrementally, one part or component at a time and not with a Big Bang.

Conclusion

Joel Spolsky is right; You don’t need a Big Rewrite. Doing a Big Rewrite is the worst mistake a software company can make. Or as Neil Gunton puts it more gentle and positive:

If you have a very successful application, don’t look at all that old, messy code as being “stale”. Look at it as a living organism that can perhaps be healed, and can evolve. – Neil Gunton

If a software developer is arguing that a Big Rewrite is needed, then remind him that the software is alive and he is responsible for keeping it healthy and growing it up to become fully matured.

Share

Features of PostgreSQL

Database systems can cost lots of money, this is fairly known. Products like Microsoft SQL Server and Oracle Standard Edition are also billed per CPU (even per core) and may also require client licenses. The costs and issues of licensing may drive people to free (not only as in beer) software. When free database systems are discussed, most people immediately think about MySQL (also owned by Oracle). But there is another, maybe even better, player in the open source market that is less known. Its name is “PostgreSQL”.

An enterprise class database, PostgreSQL boasts sophisticated features such as Multi-Version Concurrency Control (MVCC), point in time recovery, tablespaces, asynchronous replication, nested transactions (savepoints), online/hot backups, a sophisticated query planner/optimizer, and write ahead logging for fault tolerance. It supports international character sets, multibyte character encodings, Unicode, and it is locale-aware for sorting, case-sensitivity, and formatting. It is highly scalable both in the sheer quantity of data it can manage and in the number of concurrent users it can accommodate. There are active PostgreSQL systems in production environments that manage in excess of 4 terabytes of data. — source: http://www.postgresql.org/about/

This database system is free and very powerful. It supports almost all of the features that the paid (and free) counterparts have. Some of the interesting features are: GIS support, hot standby for high availablity and index-only scans (great for big data). Still not convinced? Check out the impressive Feature Matrix of PostgreSQL yourself. It has excellent support on Linux systems (also OSX and Windows) and integrates well with PHP and the frameworks like Symfony2.

Download & try PostgreSQL (for OSX or Windows) or follow installation instructions (for Ubuntu Linux)

Note: If you want a tool like PHPMyAdmin for PostgreSQL you might consider Adminer.

Share

Big data – do I need it?

Big Data?

Big data is one of the most recent “buzz words” on the Internet. This term is normally associated to data sets so big, that they are really complicated to store, process, and search trough.

Big data is known to be a three-dimensional problem (defined by Gartner, Inc*), i.e. it has three problems associated with it:
1. increasing volume (amount of data)
2. velocity (speed of data in/out)
3. variety (range of data types, sources)

Why Big Data?
As datasets grow bigger, the more data you can extract from it, and the better the precision of the results you get (assuming you’re using right models, but that is not relevant for this post). Also better and more diverse analysis could be done against the data. Diverse corporations are increasing more and more their datasets to get more “juice” out of it. Some to get better business models, other to improve user experiences, other to get to know their audience better, the choices are virtually unlimited.

In the end, and in my opinion, big data analysis/management can be a competitive advantage for corporations. In some cases, a crucial one.

Big Data Management

Big data management software is not something you buy normally on the market, as “off-the-shelf” product (Maybe Oracle wants to change this?). One of biggest questions of big data management is what do you want to do with it? Knowing this is essential to minimize to problems related with huge data sets. Of course you can just store everything and later try to make some sense of the data you have. Again, in my opinion, this is the way to get a problem and not a solution/advantage.
Since you cannot just buy a big data management solution, a strategy has to be designed and followed until something is found that can work as a competitive advantage to the product/company.

Internally at LeaseWeb we’ve got a big data set, and we can work on it at real-time speed (we are using Cassandra** at the moment) and obtaining the results we need. To get this working, we had several trial-and-error iterations, but in the end we got what we needed and until now is living up to the expectations. How much hardware? How much development time? This all depends, the question you have to ask yourself is “What do I need?”, and after you have an answer to that, normal software planning /development time applies. It can be even the case that you don’t need Big Data at all, or that you can solve it using standard SQL technologies.

In the end, our answer to the “What do I need?” provided us with all the data we needed to search what was best for us. In this case was a mix of technologies and one of them being a NoSQL database.

* http://www.gartner.com/it/page.jsp?id=1731916
** http://cassandra.apache.org/

Share