3 popular GOTO conference talks

GOTO conferences are for developers by developers. On gotocon.com you find the upcoming conferences:

  • GOTO London: Sep. 14 – 18, 2015
  • GOTO Copenhagen: Oct. 5 – 8, 2015
  • GOTO Berlin: Dec. 2 – 4, 2015
  • GOTO Chicago: May 2016
  • GOTO Amsterdam: June 13 – 15, 2016

There have already been many GOTO conferences. Many of the past talks are available on YouTube. Below you find 3 interesting talks from the YouTube GOTO conference channel.

1) Introduction to NoSQL

goto_happy_unhappy_sql

A GOTO classic from the 2012 Aarhus conference. This is actually the most viewed talk on the YouTube GOTO conference channel. Martin Fowler explains what NoSQL is and when it needs to be applied. Like no other he explains the advantages of SQL and the circumstances under which choosing NoSQL may make sense. Apart from the NoSQL topics, his explanation of off-line locks and document based databases is so good that is enough reason by itself to watch this video.

2) Challenges in implementing microservices

goto_how_fast_can_you_go

This video was recorded more recently (August 2015) in Amsterdam. Fred George explains how Web and SOA have led to new paradigms for structuring enterprise applications. Instead of a few, business­ related services, he developed systems made of many small short-lived services. This approach is called “micro­services” and Fred George talks about his experience building applications in this paradigm.

3) How Go is making us faster

goto_go_is_making_us_faster

In July 2015 Wilfried Schobeiri explained in Chicago on a GOTO conference why Go is a good match for microservices. I’m not sure about the factual correctness of the speed comparisons with C++ and Java, but Go is definitely very fast. The thing to take away from this talk is that Go might be a great match for microservices, since it has a nice standard library/toolset, easy parallelism/concurrency and simple deployment.

Why speed is more important than scalability

Software developers creating web applications like to talk about scalability as if it is totally unrelated to computing efficiency. They also like to argue that abstractions are important. They will tell you that their DBAL, Router, View Egine, Dependency Injection and even ORM do not have that much overhead (only a little). They will argue that the learning curve pays off and that the performance loss is not that bad. In this post I’ll argue that page loading speed is more important than scalability (or pretty abstractions).

Orders of magnitude of speed on the web

Just to get an idea of speed, I tried to search for a web action for every order of magnitude:

  • 0.1 ms – A simple database lookup from memory
  • 1 ms – Serving small static content from RAM
  • 10 ms – An very simple API that only does a DB lookup
  • 100 ms – A complex page load that calls multiple APIs
  • 1000 ms – Nothing should take this long… 🙂

But I’m sure that your website has pages that take a full second or more (even this site has). How can that be?

CPUs and web farms

Most severs have 1 to 4 CPUs. Each CPU has 2 to 32 cores. If you do a single web request, then you are using (at most) a single core of a single CPU. Maybe that is why people say that page loading speed is irrelevant. If you have more visitors, then they will use other cores or even other CPUs. This is true, but what if you have more concurrent requests than visitors? You can simply add machines and configure a web farm, as most people do.

At some point you may have 16 servers running your popular web application with page loads that average at 300 ms. If you could bring down the page load time to 20 ms, you could run this on a single box! Imagine how much simpler that is! The “one big box” strategy is also called the “big iron” strategy. Often programmers are not very careful with resources. This is because a software developers tend to aim for beautiful abstractions and not for fast software.

Programming languages matter

Only when hardware enthusiasts and software developers work together you may get efficient software. Nice frameworks may have to be removed. Toys like Dependency Injection that mainly bring theoretical value may have to be sacrificed. Also languages need to be chosen for execution speed. Languages that typically score good are: C, C# (Mono), Go and even Java (OpenJDK). Languages that typically score very bad: PHP, Python, Ruby and Perl (source: benchmarksgame). This is understandable, as these are all interpreted languages. But it should be considered when building a web application.

Stop using web application frameworks

But even if you use an interpreted language (which may be 10x slower), then you can still have good performance. Unless of course you build applications consisting of tens of thousands of lines of code that all need to be loaded. Normally people would not do that as it would take too much time to write. (warning: sarcasm) In order to be able to fail – against all odds – people have created frameworks. These are tens of thousands of lines of code that you can install on your server. With it your small and lean application will still be slow (/sarcasm). I think it is fair to say that frameworks will make your web application typically 10-100x slower.

Now let that be exactly the approach that is seen in the industry as “best practice”. Unbelievable right?

5 reasons your application is slow

If you are on shared hardware (VM or shared webhosting), then you need to fix that first. I’m sure switching to dedicated hardware will give you a better (and more consistent) performance pattern. Still, each of the following problems may give you an order of a magnitude of speed decrease:

  1. Not enough RAM
  2. No SSD disks (or wrong controllers)
  3. Using an interpreted programming language
  4. A bloated web framework
  5. Not using Memcache where possible

How many of these apply to you? Let me know in the comments.

Finally

This post will probably be considered offending by programmers that like VMs, frameworks and interpreted languages. Please, don’t use that negative energy in the comments. Use it to “Go” and try Gorilla, I dare you! It is not a framework and it is fast, very fast! It will not only going to be interesting and a lot of fun, it may also change your mind about this article.

NB: Go is almost as fast as the highly optimized C code of Nginx (about 10-100x faster than PHP).

Chef server API integration with PHP

In this post I will show you a quick example of how you can integrate with the chef server api from php.

If you don’t know chef I recommend to have a look at https://www.chef.io. Chef is a configuration management tool, similar to ansible or puppet.

Chef turns infrastructure into code. With Chef, you can automate how you build, deploy, and manage your infrastructure.

At LeaseWeb our infrastructure that supports our business consists of many machines. For us it was a logical step to use a configuration management tool to manage all those servers and we chose chef. We also use chef to automate most of our (web) application deployments.

While our “chef managed” infrastructure was getting bigger, deploying fixes and features got easier and more frequent we needed something so our organisation is able to know what is being deployed and when.

Php is the main language we use here and we use Guzzle for quick and easy integration with rest api’s and web services.

Guzzle is a PHP HTTP client that makes it easy to send HTTP requests and trivial to integrate with web services.

Read more about guzzle here http://guzzle.readthedocs.org/.

We have created a plugin for Guzzle3 that implements the chef server authentication algorithm as described in their documentation https://docs.chef.io/auth.html

The plugin can be found on our github page https://github.com/LeaseWeb/chefauth-guzzle-plugin.

The plugin takes care of adding all the necessary http headers and signing the request to make a fully authenticated call to the chef server.

To start consuming the chef server rest api either checkout the source code with git or add the plugin as a dependency to your project using `composer`:

    php composer.phar require "leaseweb/chef-guzzle-plugin":"1.0.0"

Once you have created a user in chef the two things you need to get started is the client name of this user (in this example we assume my-dashboard) and the private key of this client (in this example we assume it is stored in my-dashboard.pem):

    <?php

    use Guzzle\Http\Client;
    use LeaseWeb\ChefGuzzle\Plugin\ChefAuth\ChefAuthPlugin;

    // Supply your client name and location of the private key.
    $chefAuthPlugin = new ChefAuthPlugin("my-dashboard", "my-dashboard.pem");

    // Create a new guzzle client
    $client = new Client('https://manage.opscode.com');
    $client->addSubscriber($chefAuthPlugin);

    // Now you can make calls to the chef server
    $response = $client->get('/organizations/my-organization/nodes')->send();

    $nodes = $response->json();

There is a ton of things you can do with the chef api, read more about it here https://docs.chef.io/api_chef_server.html

Hopefully this plugin will make it easier to integrate your chef’ed infrastructure in your company processes.

We are playing around with:

  • automatically generating release notes for our applications,
  • automatically update our issue tracking systems after a chef deployment
  • and many more.

Automatically provision your bare metal infrastructure

At LeaseWeb we are all about automating delivery processes. Be it for our virtual products or bare metal products. This post shows you one of the many things you can do with our API.

If you have a bare metal server at LeaseWeb I encourage you to login to our customer portal The LeaseWeb Self Service Center at https://secure.leaseweb.com and
In the API section you can manage your api keys for accessing the LeaseWeb API. To read more about what you can do with our API head over to the LeaseWeb Developer Portal

Recently we have published new api calls on our developer portal for customers to manage dhcp leases for their bare metal servers.

These api calls expose our internal dhcp infrastructure, that we use for automation, to our customers as a service.

    GET    /bareMetals/{bareMetalId}/leases                 # list all leases
    POST   /bareMetals/{bareMetalId}/leases                 # create a lease
    DELETE /bareMetals/{bareMetalId}/leases/{macAddress}    # delete a lease

Customers can use it to install operating systems which are not available in the LeaseWeb Self Service Center or if they would like to automatically provision their bare metal infrastructure.

When you use our api to create a dhcp lease you have the possibility to specify the dhcp option 67 Bootfile Name. We chainload the open source ipxe network boot firmware which has http support (read more about ipxe on their website http://ipxe.org/). This means that you can provide a valid http url for dhcp option 67 Bootfile Name that points to a pxe script containing instructions what the the boot loader should do next.

For example: let’s say you own the webserver at webserver.example.com where you have placed the following ipxe script at /boot.ipxe:

    $ curl http://webserver.example.com/boot.ipxe

    #!ipxe
    dhcp
    kernel http://webserver.example.com/archiso/boot/x86_64/vmlinuz archisobasedir=archiso archiso_http_srv=http://webserver.example.com/ ip=:::::eth0:dhcp
    initrd http://webserver.example.com/archiso/boot/x86_64/archiso.img
    boot

You can now create a dhcp lease for your bare metal server using our api:

    $ curl -H 'X-Lsw-Auth: my-api-key' -X POST https://api.leaseweb.com/v1/bareMetals/{bareMetalId}/leases -d bootFileName="http://webserver.example.com/boot.i

Obviously replace {bareMetalId} with the id of your bare metal server. To view the dhcp lease that we just created you can use this call:

    $ curl -H 'X-Lsw-Auth: my-api-key' https://api.leaseweb.com/v1/bareMetals/{bareMetalId}/leases
    
    {
        "_metadata": {
            "limit": 10, 
            "offset": 0, 
            "totalCount": 1
        }, 
        "leases": [
            {
                "ip": "203.0.113.1", 
                "mac": "AA:AA:AA:AA:AA:AA", 
                "options": [
                    // ...
                    {
                        "name": "Bootfile Name", 
                        "optionId": "67", 
                        "policyName": null, 
                        "type": "String", 
                        "userClass": "gPXE", 
                        "value": "http://webserver.example.com/boot.ipxe", 
                        "vendorClass": ""
                    }
                    // ...
                ], 
                "scope": "203.0.113.0"
            }
        ]
    }

Now you have to manually reboot your server or use our api to issue a power cycle:

    $ curl -H 'X-Lsw-Auth: my-api-key' -X POST https://api.leaseweb.com/v1/bareMetals/{bareMetalId}/reboot

The server will reboot, ask for dhcp lease and eventually read the instructions provided by you in /boot.ipxe which in this example is downloading a kernel and the archlinux live cd which are both served from your web server at `webserver.example.com`.

You should be careful and not forget to remove a dhcp lease when you are done. Otherwise during the next reboot it will boot from the network again.

    $ curl -H 'X-Lsw-Auth: my-api-key' -X DELETE https://api.leaseweb.com/v1/bareMetals/{bareMetalId}/leases/AA:AA:AA:AA:AA:AA

We automatically remove dhcp leases after 24 hours .

This service allows our customers to implement creative ideas that can automate their bare metal infrastructure.

Example: install arch linux over ssh without kvm

To continue the example I used this service to boot my modified version of the archlinux live cd which includes and starts openssh at boot and includes my public ssh keys. I use this trick to be able to manually install an operating system which is not available through the LeaseWeb Self Service Center.

I don’t need to contact technical support or have kvm on my server. Everything is done remotely over ssh. The modified live image is published on github here https://github.com/nrocco/archiso-sshd.

Clone the repository from github:

    $ git clone https://github.com/nrocco/archiso-sshd.git
    $ cd archiso-sshd

Add your ssh keys to authorized_keys of the root user:

    $ vim airootfs/root/.ssh/authorized_keys

Now build the image (you need to have the archiso package installed).

    $ make build

This might take a while. When done, copy the kernel, initrmfs and other generated files to the document root of your http server:

    $ cp -r work/iso/arch /var/www

Your document root might look like this now:

    $ find /var/www -type f
    /var/www/boot.ipxe
    /var/www/archiso/pkglist.x86_64.txt
    /var/www/archiso/x86_64/airootfs.md5
    /var/www/archiso/x86_64/airootfs.sfs
    /var/www/archiso/boot/x86_64/archiso.img
    /var/www/archiso/boot/x86_64/vmlinuz

That’s it. Now you boot from the network using our service.

Refer to airootfs/root/customize_airootfs.sh and airootfs/root/.ssh/authorized_keys for the specific customatizations.

What can you do with it?

This example is just the tip of the iceberg of possibilities. Let us know your ideas and use cases.

You might use it to boot into your own live image that does an automated installation of the operating system and kicks off the provisioning tool of your choice (chef, ansible, puppet) so your bare metal servers joins your infrastructure that helps supporting your business.

All fully automated.

Affordable real-time Big Data with streaming analytics

In our CDN pops in various locations world-wide we analyze between 10.000 and 100.000 access log lines per second. We collect this data in order to be able to provide analytics and, even more important, billing information. The most important question is: How much data was served for each customer? This is information we need to collect and present in our control panel. Obviously it needs to be 100% correct. In our systems each log line holds over 20 fields. These fields need to analyzed in real-time. Real-time is a bit of a flexible concept. Some people claim that a 24 hour delay is still real-time. When we talk about real-time we aim for an update interval of half a minute.

So effectively our clusters need to analyze 2 millions of values per second and turn them into a set of 30 real-time analytics that update every 30 seconds.

There is only so little time

Let’s assume you are only willing to spend 10% extra on hardware to add analytics to your product, then the amount of time you can spend per data point is limited. If at peak time the average requests per second a machine is handling is 1000 and each machine is a dual hexacore server, then you can spend maximum of (1/1000)*16*0.10 = 0.0016 second = 1.6 millisecond per log line in a single threaded application for doing analytics CPU time. On I/O we can do a similar kind of calculation with disks instead of cores. You must agree that we are on a tight budget considering that a 15k SAS disk’s average seek time is 2 ms. Also note that it does not matter on which machine or tier this calculation happens as the budget stays the same.

Traditional Big Data approach

You collect all raw data that you have and you write it to a DFS (like Hadoop Distributed File System). Then you define MapReduce jobs to actually go over the data and aggregate it into a result format. That result is either stored in a NoSQL database, on a DFS or if sufficiently reduced into a relational database (like MySQL). To aggregate aggregated data we can read the result from disk and aggregate once more. In a very naive approach the disk I/O is high as the data would be written to log files on the edge, then read from the logs and written to the DFS and then read and written for each aggregation level and/or different kind of statistic.

Streaming analytics approach

streaming_statistics

Figure 1: An example of a streaming analytics infrastructure with the data flowing from left to right.

This approach is much simpler and performs better. We read the data directly from the log files as it is just written and probably still memory mapped, which reduces the I/O on the nodes. Updating all counters in RAM directly after the data is read for all resulting statistics is also relatively cheap. All these statistics are removed from RAM and sent upstream to an aggregation server with a certain interval (flush rate). On the aggregation server all counters are again updated in RAM and then written to disk in the final format. This final format is either in MySQL or in files on disk in the format that they will be queried.

I/O is limited, plenty of CPU

Most calculations done in analytics are extremely easy. It merely consists of counting occurrences. Sometimes only occurrences of certain combinations are counted and sometimes other fields are summed, like the amount of bytes transfered or seconds spent. Also note that the combination of a sum and a count can then lead to an average. These simple calculations do not require much CPU. But every time an number is to be incremented it needs to be read and written. And we need to realize that the amount of available I/O is also limited. Faster disks are expensive, especially when there is a need for high capacity.

RAM to the rescue

The only way to reduce the I/O is to keep the numbers in RAM (memory). This way they do not have to be read or written. This has it’s own disadvantage: your RAM capacity needs to be high or your result data needs to be small. Keeping the result data small is actually valuable. Humans cannot process tens of thousands of data points anyway, so why not reduce it to the most valuable information possible. Also, lets do it as early in the process as possible to not waste any expensive CPU, I/O or RAM. This is how the streaming analytics work. Log files are only written to disk once (on the edge nodes) and then processed in RAM, aggregated in RAM, sent over HTTP to a central server, where they are again aggregated in RAM to be removed and inserted in a traditional relational database, since they are sufficiently reduced in size.

MySQL is not so bad

MySQL can actually do many inserts or update queries per second. On my SSD-driven i5 laptop when using a non-optimized (default) MySQL with extended queries I got over 15.000 values updated per second. Which was a lot higher than I assumed was possible. If it is needed you can store large sets of values in text fields in MySQL (in the resulting json format) to reduce load on MySQL, but also give up the flexibility to query it any way you like. The flush rate tells us how often the aggregated analytics are sent upstream. Lowering the flush rate increases the total delay in the analytics, but on the other hand lowers the amount of writes needed on MySQL. This is easy to see, as every counter that is updated is then updated less frequently and with a larger increment.

High availability

When a node in the system goes down it clearly does not do any traffic anymore so it does also not have to do statistics. If the aggregation or MySQL server goes down, the nodes can stop further log analysis to only continue once the server(s) are available again. Web nodes are typically configured to hold several days of logging information on disks, so this would not cause a problem. The nodes may have a little higher load when they are trying to keep up for lost time, but the speed during recovery can easily be throttled to minimize this problem.

Optimizations

The most important rule is to keep the resulting statistics limited in size (relatively small). Otherwise you are using too much RAM and you have to fall back to slow disk I/O. By having a retention policy per statistic, you can make sure you delete any data that is no longer relevant. Obviously MySQL can be optimized using (and omitting) indexes and changing its disk writing strategy. One of the most powerful knobs is making this flush rate configurable per statistic, so you can tune the performance of the system and lower the delay or lower the writes. Also for top lists you can apply length limitations that would lower the correctness slightly, but highly reduce the data transfer and RAM usage. A last resort would be to only process a sample of the log lines for certain less important statistics on certain higher volume customers.

Conclusion

It seems that for real-time log file analysis this “Streaming Analytics” approach is much more suitable than a more traditional Big Data approach. The only real downside is that you have to keep the resulting data small in size, so that it fits in RAM. We’ve calculated that we are able to achieve more than a factor 10 higher performance compared to our previous implementation. Now that is what I call innovation!