Leaseweb Cloud AWS EC2 support

As you might know, some of the products LeaseWeb include in its portfolio are Public and Private Cloud based on Apache CloudStack, which supports a full API. We, LeaseWeb, are very open about this, and we try to be as much involved and participative in the community and product development as possible. You might be familiar with this if you are a Private Cloud customer. In this article we target the current and former EC2 users, who probably have already tools built upon AWS CLI, by demonstrating how you can keep using them with LeaseWeb Private Cloud solutions.

Apache CloudStack has supported EC2 API for some time in the early days, but along the way, while EC2 API evolved, CloudStack’s support has somewhat stagnated. In fact, The AWS API component from CloudStack was recently detached from the main distribution as to simplify the maintenance of the code.

While this might sound like bad news, it’s not – at all. In the meantime, another project spun off, EC2Stack, and was embraced by Apache as well. This new stack supports the latest API (at the time of writing) and is much easier to maintain both in versatility as in codebase. The fact that it is written in Python opens up the audience for further contribution while at the same time allows for quick patching/upgrade without re-compiling.

So, at some point, I thought I could share with you how to quickly setup your AWS-compatible API so you can reuse your existing scripts. On to the details.

The AWS endpoint acts as an EC2 API provider, proxying requests to LeaseWeb API, which is an extension to the native CloudStack API. And since this API is available for Private Cloud customers, EC2Stack can be installed by the customer himself.

Following is an illustration of how this can be done. For the record, I’m using Ubuntu 14.04 as my desktop, and I’ll be setting up EC2stack against LeaseWeb’s Private Cloud in the Netherlands.

First step is to gather all information for EC2stack. Go to your LeaseWeb platform console, and obtain API keys for your user (sensitive information blurred):

apikeys-blurred

Note down the values for API Key and Secret Key (you should already know the concepts from AWS and/or LeaseWeb Private Cloud).

Now, install EC2Stack and configure it:

ntavares@mylaptop:~$ pip install ec2stack 
[…]
ntavares@mylaptop:~$ ec2stack-configure 
EC2Stack bind address [0.0.0.0]: 127.0.0.1 
EC2Stack bind port [5000]: 5000 
Cloudstack host [mgt.cs01.leaseweb.net]: csrp01nl.leaseweb.com 
Cloudstack port [443]: 443 
Cloudstack protocol [https]: https 
Cloudstack path [/client/api]: /client/api 
Cloudstack custom disk offering name []: dualcore
Cloudstack default zone name [Evoswitch]: CSRP01 
Do you wish to input instance type mappings? (Yes/No): Yes 
Insert the AWS EC2 instance type you wish to map: t1.micro 
Insert the name of the instance type you wish to map this to: Debian 7 amd64 5GB 
Do you wish to add more mappings? (Yes/No): No 
Do you wish to input resource type to resource id mappings for tag support? (Yes/No): No 
INFO  [alembic.migration] Context impl SQLiteImpl. 
INFO  [alembic.migration] Will assume non-transactional DDL. 

The value for the zone name will be different if your Private Cloud is not in the Netherlands POP. The rest of the values can be obtained from the platform console:

serviceoffering-blurred

template-blurred
You will probably have different (and more) mappings to do as you go, just re-run this command later on.

At this point, your EC2stack proxy should be able to talk to your Private Cloud, so we now need to instruct it to launch it to accept EC2 API calls for your user. For the time being, just run it on a separate shell:

ntavares@mylaptop:~$ ec2stack -d DEBUG 
 * Running on http://127.0.0.1:5000/ 
 * Restarting with reloader

And now register your user using the keys you collected from the first step:

ntavares@mylaptop:~$ ec2stack-register http://localhost:5000 H5xnjfJy82a7Q0TZA_8Sxs5U-MLVrGPZgBd1E-1HunrYOWBa0zTPAzfXlXGkr-p0FGY-9BDegAREvq0DGVEZoFjsT PYDwuKWXqdBCCGE8fO341F2-0tewm2mD01rqS1uSrG1n7DQ2ADrW42LVfLsW7SFfAy7OdJfpN51eBNrH1gBd1E 
Successfully Registered!

And that’s it, as far the API service is concerned. As you’d normally do with AWS CLI, you now need to “bind” the CLI to this new credentials:

ntavares@mylaptop:~$ aws configure 
AWS Access Key ID [****************yI2g]: H5xnjfJy82a7Q0TZA_8Sxs5U-MLVrGPZgBd1E-1HunrYOWBa0zTPAzfXlXGkr-p0FGY-9BDegAREvq0DGVEZoFjsT
AWS Secret Access Key [****************L4sw]: PYDwuKWXqdBCCGE8fO341F2-0tewm2mD01rqS1uSrG1n7DQ2ADrW42LVfLsW7SFfAy7OdJfpN51eBNrH1gBd1E  Default region name [CS113]: CSRP01 
Default output format

: text

And that’s it! You’re now ready to use AWS CLI as you’re used to:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 --output json ec2 describe-images | jq ' .Images[] | .Name ' 
"Ubuntu 12.04 i386 30GB" 
"Ubuntu 12.04 amd64 5GB" 
"Ubuntu 13.04 amd64 5GB" 
"CentOS 6 amd64 5GB" 
"Debian 6 amd64 5GB" 
"CentOS 7 amd64 20140822T1151" 
"Debian 7 64 10 20141001T1343" 
"Debian 6 i386 5GB" 
"Ubuntu 14.04 64bit with docker.io" 
"Ubuntu 12.04 amd64 30GB" 
"Debian 7 i386 5GB" 
"Ubuntu 14.04 amd64 20140822T1234" 
"Ubuntu 12.04 i386 5GB" 
"Ubuntu 13.04 i386 5GB" 
"CentOS 6 i386 5GB" 
"CentOS 6 amd64 20140822T1142" 
"Ubuntu 12.04 amd64 20140822T1247" 
"Debian 7 amd64 5GB"

Please note that I only used JSON output (and JQ to parse it) for summarising the results, as any other format wouldn’t fit on the page.

To create a VM with built-in SSH keys, you should create/setup your keypair in LeaseWeb Private Cloud, if you didn’t already. In the following example I’m generating a new one, but of course you could load your existing keys.

ssh-keypairs-blurred

You will want to copy paste the generated key (in Private Key) to a file and protect it. I saved mine in $HOME/.ssh/id_ntavares.csrp01.key .

ssh-keypairs2-blurred

This key will be used later to log into the created instances and extract the administrator password. Finally, instruct the AWS CLI to use this keypair when deploying your instances:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 run-instances \
 --instance-type dualcore \
 --image-id 7c123f01-9865-4312-a198-05e2db755e6a \
 --key-name ntavares-key 
INSTANCES	KVM	7c123f01-9865-4312-a198-05e2db755e6a	a0977df5-d25e-40cb-9f78-b3a551a9c571	dualcore	ntavares-key	2014-12-04T12:03:32+0100	10.42.1.129 
PLACEMENT	CSRP01 
STATE	16	running 

Note that the image-id is taken from the previous listing (the one I simplified with JQ).

Also note that although EC2stack is fairly new, and there are still some limitations to this EC2-CS bridge – see below for a mapping of supportedAPI calls. For instance, one that you can could run into at the time of writing this article (~2015) was the inability to deploy an instance if you’re using multiple Isolated networks (or multiple VPC). Amazon shares this concept as well, so a simple patch was necessary.

For this demo, we’re actually running in an environment with multiple isolated networks, so if you ran the above command, you’d get the following output:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 run-instances \
 --instance-type dualcore \
 --image-id 7c123f01-9865-4312-a198-05e2db755e6a \
 --key-name ntavares-key
A client error (InvalidRequest) occurred when calling the RunInstances operation: More than 1 default Isolated networks are found for account Acct[47504f6c-38bf-4198-8925-991a5f801a6b-rme]; please specify networkIds

In the meantime, LeaseWeb’s patch was merged, as many others, which both demonstrates the power of Open Source and the activity on this project.

Naturally, the basic maintenance tasks are there:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 describe-instances 
RESERVATIONS	None 
INSTANCES	KVM	7c123f01-9865-4312-a198-05e2db755e6a	a0977df5-d25e-40cb-9f78-b3a551a9c571	dualcore	ntavares-key	2014-12-04T12:03:32+0100	10.42.1.129	10.42.1.129 
PLACEMENT	CSRP01	default 
STATE	16	running

I’ve highlighted some information you’ll need now to login to the instance: the instance id, and IP address, respectively. You can login either with your ssh keypair:

[root@jump ~]# ssh -i $HOME/.ssh/id_ntavares.csrp01.key root@10.42.1.129 
Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 

[...] 
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent 
permitted by applicable law. 
root@a0977df5-d25e-40cb-9f78-b3a551a9c571:~#

If you need, you can also retrieve the password the same way you do with EC2:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 get-password-data --instance-id a0977df5-d25e-40cb-9f78-b3a551a9c571 
None dX5LPdKndjsZkUo19Z3/J3ag4TFNqjGh1OfRxtzyB+eRnRw7DLKRE62a6EgNAdfwfCnWrRa0oTE1umG91bWE6lJ5iBH1xWamw4vg4whfnT4EwB/tav6WNQWMPzr/yAbse7NZHzThhtXSsqXGZtwBNvp8ZgZILEcSy5ZMqtgLh8Q=

As it happens with EC2, password is returned encrypted, so you’ll need your key to display it:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 get-password-data --instance-id a0977df5-d25e-40cb-9f78-b3a551a9c571 | awk '{print $2}' > ~/tmp.1
ntavares@mylaptop:~$ openssl enc -base64 -in tmp.1 -out tmp.2 -d -A 
ntavares@mylaptop:~$ openssl rsautl -decrypt -in tmp.2 -text -inkey $HOME/.ssh/id_ntavares.csrp01.key 
ntavares@mylaptop:~$ cat tmp.3 ; echo 
hI5wueeur
ntavares@mylaptop:~$ rm -f tmp.{1,2,3} 
[root@jump ~]# sshpass -p hI5wueeur ssh root@10.42.1.129 
Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 

[...]
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent 
permitted by applicable law. 
Last login: Thu Dec  4 13:33:07 2014 from jump.rme.leasewebcloud.com 
root@a0977df5-d25e-40cb-9f78-b3a551a9c571:~#

The multiple isolated networks scenario

If you’re already running multiple isolated networks in your target platform (be either VPC-bound or not), you’ll need to pass argument –subnet-id to the run-instances command to specify which network to deploy the instance into; otherwise CloudStack will complain about not knowing in which network to deploy the instance. I believe this is due to the fact that Amazon doesn’t allow the use the Isolated Networking as freely as LeaseWeb – LeaseWeb delivers you the full flexibility at the platform console.

Since EC2stack does not support describe-network-acls (as of December 2014) in order to allow you to determine which Isolated networks you could use, the easiest way to determine them is to go to the platform console and copy & paste the Network ID of the network you’re interested in:

Then you could use –subnet-id:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 run-instances \
 --instance-type dualcore \
 --image-id 7c123f01-9865-4312-a198-05e2db755e6a \
 --key-name ntavares-key \
 --subnet-id 5069abd3-5cf9-4511-a5a3-2201fb7070f8
PLACEMENT	CSRP01 
STATE	16	running 

I hope I demonstrated a bit of what can be done in regards to compatible EC2 API. Other funtions are avaiable for more complex tasks although, as wrote earlier, EC2stack is quite new, for which you might need community assistance if you cannot develop the fix on your own. At LeaseWeb we are very interested to know your use cases, so feel free to drop us a note.

Share

Heka monolog decoder

This post is about how to use heka to give your symfony 2 application logs the care they deserve.

Application logs are very important for the quality of the product or service you are offering.

They help you find out what went wrong so you can explain and fix a bug that was reported recently. Or maybe to gather statistics to see how often a certain feature is used. For example how many bare metal reinstallation requests were issued last month and how many of those failed. Valuable information that you could use to decide what feature you are going to work on next.

At LeaseWeb we use quite a lot of php and Seldaek’s monolog is our logging library of choice. Recently we open sourced a heka decoder on github here. For you who do not know heka yet, check out their documentation.

Heka is an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as …

Heka runs as a system daemon just like logstash or fluentd. Heka is written in go and comes with an easy to use plugin system based on lua. It has almost no dependencies and is lightweight. James Turnbull has written a nice article on how to get started with heka.

send symfony 2 application logs to elastic search

How better to explain with an example.

Lets say you have a symfony 2 application and you want the application logs to be sent to an Elastic Search platform.

On a debian based os you can download one of the heka debian packages from github.

    $ wget https://github.com/mozilla-services/heka/releases/download/v0.9.2/heka_0.9.2_i386.deb
    $ sudo dpkg -i heka_0.9.2_i386.deb

To configure heka you are required to edit the configuration file located at /etc/hekad.toml.

    $ vim /etc/hekad.toml

Take your time to read the excellent documentation on heka. There are many ways of using heka but we will use it as a forwarder:

  1. Define an input, where messages come from.
  2. Tell heka how it should decode the monolog log line into a heka message
  3. Tell heka how to encode the message so Elastic Search will understand it
  4. Define an output, where should the messages be sent to.

First we define the input:

    [Symfony2MonologFileInput]
    type = "LogstreamerInput"
    log_directory = "/var/www/app/logs"
    file_match = 'prod\.log'
    decoder = "Symfony2MonologDecoder"

Adjust `log_directory` and `file_match` according to your setup. As you can see we alread told heka to use the `Symfony2MonologDecoder` to we will define that one next:

    [Symfony2MonologDecoder]
    type = "SandboxDecoder"
    filename = "/etc/symfony2_decoder.lua"

Change the `filename` with the path where you placed the lua script on your system.

Now we have defined the input we can tell heka where to output messages to:

    [ESJsonEncoder]
    index = "%{Hostname}"
    es_index_from_timestamp = true
    type_name = "%{Type}"

    [ElasticSearchOutput]
    message_matcher = "TRUE"
    server = "http://192.168.100.1:9200"
    flush_interval = 5000
    flush_count = 10
    encoder = "ESJsonEncoder"

In the above example we assume that your Elastic Search server is running at 192.168.100.1.

And thats it.

A simple log line in app/logs/prod.log:

    [2015-06-03 22:08:02] app.INFO: Dit is een test {"bareMetalId":123,"os":"centos"} {"token":"556f5ea25f6af"}

Is now sent to Elastic Search. You should now be able to query your Elastic Search for log messages, assuming the hostname of your server running symfony 2 is myapi:

    $ curl http://192.168.100.1:9200/myapi/_search | python -mjson.tool
    {
        "_shards": {
            "failed": 0,
            "successful": 5,
            "total": 5
        },
        "hits": {
            "hits": [
                {
                    "_id": "ZIV7ryZrQRmXDiB6thY_yQ",
                    "_index": "myapi",
                    "_score": 1.0,
                    "_source": {
                        "EnvVersion": "",
                        "Hostname": "myapi",
                        "Logger": "Symfony2MonologFileInput",
                        "Payload": "Dit is een test",
                        "Pid": 0,
                        "Severity": 7,
                        "Timestamp": "2015-06-03T20:08:02.000Z",
                        "Type": "logfile",
                        "Uuid": "344e7cae-6ab7-4fb2-a770-d2cbad6653c3",
                        "channel": "app",
                        "levelname": "INFO",
                        "bareMetalId": 123,
                        "os": "centos",
                        "token": "556f5ea25f6af"
                    },
                    "_type": "logfile"
                },
        // ...
        }
    }

What is important to notice is that the keys token, bareMetalId and os in the monolog log line end up in Elastic Search as an indexable fields. From your php code you can add this extra information to your monolog messages by supplying an associative array as a second argument to the default monolog log functions:

    <?php

    $logger = $this->logger;
    $logger->info('The server was reinstalled', array('bareMetalId' => 123, 'os' => 'centos'));

Happy logging!

Share

Why speed is more important than scalability

Software developers creating web applications like to talk about scalability as if it is totally unrelated to computing efficiency. They also like to argue that abstractions are important. They will tell you that their DBAL, Router, View Egine, Dependency Injection and even ORM do not have that much overhead (only a little). They will argue that the learning curve pays off and that the performance loss is not that bad. In this post I’ll argue that page loading speed is more important than scalability (or pretty abstractions).

Orders of magnitude of speed on the web

Just to get an idea of speed, I tried to search for a web action for every order of magnitude:

  • 0.1 ms – A simple database lookup from memory
  • 1 ms – Serving small static content from RAM
  • 10 ms – An very simple API that only does a DB lookup
  • 100 ms – A complex page load that calls multiple APIs
  • 1000 ms – Nothing should take this long… 🙂

But I’m sure that your website has pages that take a full second or more (even this site has). How can that be?

CPUs and web farms

Most severs have 1 to 4 CPUs. Each CPU has 2 to 32 cores. If you do a single web request, then you are using (at most) a single core of a single CPU. Maybe that is why people say that page loading speed is irrelevant. If you have more visitors, then they will use other cores or even other CPUs. This is true, but what if you have more concurrent requests than visitors? You can simply add machines and configure a web farm, as most people do.

At some point you may have 16 servers running your popular web application with page loads that average at 300 ms. If you could bring down the page load time to 20 ms, you could run this on a single box! Imagine how much simpler that is! The “one big box” strategy is also called the “big iron” strategy. Often programmers are not very careful with resources. This is because a software developers tend to aim for beautiful abstractions and not for fast software.

Programming languages matter

Only when hardware enthusiasts and software developers work together you may get efficient software. Nice frameworks may have to be removed. Toys like Dependency Injection that mainly bring theoretical value may have to be sacrificed. Also languages need to be chosen for execution speed. Languages that typically score good are: C, C# (Mono), Go and even Java (OpenJDK). Languages that typically score very bad: PHP, Python, Ruby and Perl (source: benchmarksgame). This is understandable, as these are all interpreted languages. But it should be considered when building a web application.

Stop using web application frameworks

But even if you use an interpreted language (which may be 10x slower), then you can still have good performance. Unless of course you build applications consisting of tens of thousands of lines of code that all need to be loaded. Normally people would not do that as it would take too much time to write. (warning: sarcasm) In order to be able to fail – against all odds – people have created frameworks. These are tens of thousands of lines of code that you can install on your server. With it your small and lean application will still be slow (/sarcasm). I think it is fair to say that frameworks will make your web application typically 10-100x slower.

Now let that be exactly the approach that is seen in the industry as “best practice”. Unbelievable right?

5 reasons your application is slow

If you are on shared hardware (VM or shared webhosting), then you need to fix that first. I’m sure switching to dedicated hardware will give you a better (and more consistent) performance pattern. Still, each of the following problems may give you an order of a magnitude of speed decrease:

  1. Not enough RAM
  2. No SSD disks (or wrong controllers)
  3. Using an interpreted programming language
  4. A bloated web framework
  5. Not using Memcache where possible

How many of these apply to you? Let me know in the comments.

Finally

This post will probably be considered offending by programmers that like VMs, frameworks and interpreted languages. Please, don’t use that negative energy in the comments. Use it to “Go” and try Gorilla, I dare you! It is not a framework and it is fast, very fast! It will not only going to be interesting and a lot of fun, it may also change your mind about this article.

NB: Go is almost as fast as the highly optimized C code of Nginx (about 10-100x faster than PHP).

Share

David Beazley’s concurrency demo on PyCon 2015

pycon_2015David Beazley did an amazing live demo on Python Concurrency on PyCon 2015 (Montreal, last month). Lucky for us it is available on YouTube! I must say, that once I started watching the video I was hooked! I simply could not stop watching for the entire 45 minutes.

Python concurrency methods and their performance characteristics

In the beginning of the video David types in a Fibonacci socket server and shows the audience how socket programming works. That is already quite impressive (as he talks and types at the same time). Then he evaluates several approaches to Python concurrency: threads, processes, co-routines combined with the select call and finally co-routines combined with the select call and threads.

The performance characteristics of the approaches are shown by running a concurrency test and then a “requests per second” and a “response time in seconds” performance test. David clearly explains which role the GIL plays on the concurrency of co-routines and also where you see the (operating system) overhead caused by threads and processes.

In the questions he says that you should follow the best practices on concurrency and he specifically names “don’t have shared state” and “use functions”. See the talk by clicking on the video below:


Thank you, PyCon 2015, for sharing this video with us!

Share

Best practices and the joy of programming

lazarus

When I was about 16 years old (1996) I was programming object pascal (Delphi). In that language I wrote a calculator program to convert a number from decimal to binary. While implementing this functionality I found out that I could make this code generic, such that you could give it any base (2 for binary, 8 for octal, 10 for decimal and 16 for hexadecimal) and convert the numbers back and forth. This was an extremely satisfying experience and it felt like I created a thing of beauty, like I discover a secret or maybe as they call it “a programming gem”.

Since nobody uses Delphi anymore (you may want to try Lazarus though), I rewrote the code into Python for any beginner to understand.

def convert_base(s,b,nb):
	# character set
	c = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
	# function to convert int to string
	def int2str(n,b):
		if n==0: return ''
		return int2str(n // b, b) + c[n % b]
	# function to convert string to int
	def str2int(s,b):
		if s=='': return 0
		return str2int(s[:-1],b) * b + c.index(s[-1:])
	# filter invalid characters
	s = ''.join([k for k in s if k in c[0:b]])
	# convert string and return
	if s=='': s = '0'
	s = int2str(str2int(s,b),nb)
	if s=='': s = '0'
	return s

I can’t imagine anyone will feel the same satisfaction reading this code as I did writing it in 1996. Let me explain why. In 1996 we did not have the Internet yet, or Stack Overflow, where all the programming questions you may dream of are already asked. You would typically learn everything from books, both the how and the why, and you could not cut corners by Googling the answer. So there was no guarantee, unlike now, that with proper research you find the most “Pythonic” implementation.

You could even turn this around. Nowadays we know so good how things “should” be done, that there is little room for innovation. If you come up with a different approach to a known problem it quickly dismissed as “wrong” and “not a best practice”. On one hand this may be a sign that the software industry is maturing. On the other hand it may be crippling, because we now have generation of programmers who have no clue why things are the way they are. They can’t argue the motives of their choices any further than it being a “best practice”.

I can think of plenty examples, but the most striking is when I was implementing “default routing” in Symfony2. It is considered a “bad practice” by the Symfony author, hence removed in the second version (and I implemented it again). Hardly any programmer I’ve met can properly argue why it makes sense that it was removed (but they all tend to have an opinion). I guess that proves the point.

Share