Measuring and Monitoring With Prometheus and Alertmanager Part 2

This is part two in the series about Prometheus and Alertmanager.

In the first part we installed the Prometheus server and the node exporter, in addition to discovering some of the measuring and graphing capabilities using the Prometheus server web interface.

In this part, we will be looking at Grafana to expand the possibilities of graphing our metrics, and we will use Alertmanager to alert us of any metrics that are outside the boundaries we define for them. Finally, we will install a dashboard application for a nice tactical overview of our Prometheus monitoring platform.

Installing Grafana

The installation of Grafana is fairly straightforward, and all of the steps involved are described in full detail in the official documentation.

For Ubuntu, which we’re using in this series, the steps involved are:

sudo apt-get install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install grafana

At this point Grafana will be installed, but the service has not been started yet.

To start the service and verify that the service has started:

sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl status grafana-server

You should see something like this:

If you want Grafana to start at boot time, run the following:

sudo systemctl enable grafana-server.service

Grafana listens on port 3000 by default, so at this point you should be able to access your Grafana installation at http://<IP of your Grafana server>:3000

You will be welcomed by the login screen. The default login after installation is admin with password admin.

After succesfully logging in, you will be asked to change the password for the admin user. Do this immediately!

Creating the Prometheus Data Source

Our next step is to create a new data source in Grafana that connects to our Prometheus installation. To do this, go to Configuration > Data Sources and click the blue Add data source button.

Grafana supports various time series data sources, but we will pick the top one, which is Prometheus.

Enter the URL of your Prometheus server, and that’s it! Leave all the other fields untouched, they are not needed at this point.

You should now have a Prometheus data source in Grafana, and we can start creating some dashboards!

Creating Our First Grafana Dashboard

A lot of community-created dashboards can be found at https://grafana.com/grafana/dashboards. We’re going to use one of them that will give us a very nice overview of the metrics scraped from the node exporter.

To import a dashboard click the + icon in the side menu, and then click Import.

Enter the dashboard ID 1860 in the ‘Import via grafana.com’ field and click ‘Load’.

The dashboard should be imported, and the only thing we still need to do is select our Prometheus data source we just created in the dropdown at the bottom of the page and click ‘Import’:

You should now have your first pretty Grafana dashboard, that shows all of the important metrics offered by the node exporter.

Adding Alertmanager in the Mix

Now that we have all these metrics of our nodes flowing into Prometheus, and we have a nice way of visualising this data, it would be nice if we could also raise alerts when things don’t go as planned. Grafana offers some basic alerting functionality for Prometheus data sources, but if you want more advanced features, Alertmanager is the way to go.

Alerting rules are set up in Prometheus server. These rules allow you to define alert conditions based on PromQL expressions. Whenever an alert expression amounts to a result, the alert is considered active.

To turn this active alert condition into an action, Alertmanager comes into play. It is able to send out notification to a large variety of methods such as email, various communication platforms such as Slack or Mattermost, or several incident/on-call management tools such as Pagerduty and OpsGenie. Alertmanager also handles summarization, aggregation, rate limiting and silencing of the alerts.

Let’s go ahead and install Alertmanager on the Prometheus server instance we installed in part one of this blog.

Installing Alertmanager

Start off by creating a seperate user for alertmanager:

useradd -M -r -s /bin/false alertmanager

Next, we need a directory for the configuration:

mkdir /etc/alertmanager
chown alertmanager:alertmanager /etc/alertmanager

Then download Alertmanager and verify its integrity:

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
wget -O - -q https://github.com/prometheus/alertmanager/releases/download/v0.21.0/sha256sums.txt | grep linux-amd64 | shasum -c -

The last command should result in alertmanager-0.21.0.linux-amd64.tar.gz: OK. If it doesn’t, the downloaded file is corrupted, and you should try again.

Next we unpack the file and move the various components into place:

tar xzf alertmanager-0.21.0.linux-amd64.tar.gz
cp alertmanager-0.21.0.linux-amd64/{alertmanager,amtool} /usr/local/bin/
chown alertmanager:alertmanager /usr/local/bin/{alertmanager,amtool}

And clean up our downloaded files in /tmp:

rm -f /tmp/alertmanager-0.21.0.linux-amd64.tar.gz
rm -rf /tmp/alertmanager-0.21.0.linux-amd64

We need to supply Alertmanager with an initial configuration. For our first test, we will configure alerting by email (Be sure to adapt this configuration for your email setup!):

global:
  smtp_from: 'AlertManager <alertmanager@example.com>'
  smtp_smarthost: 'smtp.example.com:587'
  smtp_hello: 'alertmanager'
  smtp_auth_username: 'username'
  smtp_auth_password: 'password'
  smtp_require_tls: true

route:
  group_by: ['instance', 'alert']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: myteam

receivers:
  - name: 'myteam'
    email_configs:
      - to: 'user@example.com'

Save this in a file called /etc/alertmanager/alertmanager.yml and set permissions:

chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml

To be able to start and stop our alertmanager instance, we will create a systemd unit file. Use you favorite editor to create the file /etc/systemd/system/alertmanager.service and add the following to it (replacing <server IP> with the IP or resolvable FQDN of your server):

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
WorkingDirectory=/etc/alertmanager/
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --web.external-url http://<server IP>:9093

[Install]
WantedBy=multi-user.target

Activate and start the service with the following commands:

systemctl daemon-reload
systemctl start alertmanager
systemctl enable alertmanager

The command systemctl status alertmanager should now indicate that our service is up and running:

Now we need to alter the configuration of our Prometheus server to inform it about our Alertmanager instance. Edit the file /etc/prometheus/prometheus.yml. There should already be a alerting section. All we need to do change the section so it looks like this:

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - localhost:9093

We also need to tell Prometheus where our alerting rules live. Change the rule_files section to look like this:

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/etc/prometheus/rules/*.yml"

Save the changes, and create the directory for the alert rules:

mkdir /etc/prometheus/rules
chown prometheus:prometheus /etc/prometheus/rules

Restart the Prometheus server to apply the changes:

systemctl restart prometheus

Creating Our First Alert Rule

Alerting rules are written using the Prometheus expression language or PromQL. One of the easiest things to check is whether all Prometheus targets are up, and trigger an alert when a certain exporter target becomes unreachable. This is done with the simple expression up.

Let’s create our first alert by creating the file /etc/prometheus/rules/alert-rules.yml with the following content:

groups:
- name: alert-rules
  rules:
  - alert: ExporterDown
    expr: up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      description: 'Metrics exporter service for {{ $labels.job }} running on {{ $labels.instance }} has been down for more than 5 minutes.'
      summary: 'Exporter down (instance {{ $labels.instance }})'

This alert will trigger as soon as any of the exporter targets in Prometheus is not reported as up for more than 5 minutes. We apply the severity label critical to it.

Restart prometheus with systemctl restart prometheus to load the new alert rule.

You should be able to see the alert rule in the prometheus web interface now too, by going to the Alerts section.

Now the easiest way for us to check if this alert actually fires, and we get our email notification, is to stop the node exporter service:

systemctl status node_exporter

As soon as we do this, we can see that the alert status has changed in the Prometheus server dashboard. It is now marked as active, but is not yet firing, because the condition needs to persist for a minimum of 5 minutes, as specified in our alert rule.

When the 5 minute mark is reached, the alert fires, and we should receive an email from Alertmanager alerting us about the situation:

We should also be able to manage the alert now in the Alertmanager web interface. Open http://<server IP>:9093 in your browser and the alert that we just triggered should be listed. We can choose to silence the alert, to prevent any more alerts from being sent out.

Click silence, and you will be able to configure the duration of the silence period, add a creator and a description for some more metadata, and expand or limit the group of alerts this particular silence applies to. If, for example, i would have wanted to silence all ExporterDown alerts for the next 2 hours, I could remove the instance matcher.

More Advanced Alert Examples

Since Prometheus alerts use the same powerful PromQL expressions as queries, we are able to define rules that go way beyond whether a service is up or down. For a full rundown of all the PromQL functions available, check out the Prometheus documentation or the excellent PromQL for humans.

Memory Usage

For starters, here is an example of an alert rule to check the memory usage of a node. It fires once the percentage of memory available is smaller than 10% of the total memory available for a duration of 5 minutes:

  - alert: HostOutOfMemory
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: 'Host out of memory (instance {{ $labels.instance }})'
      description: 'Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'

Disk Space

We can do something similar for disk space. This alert will fire as soon as one of our target’s filesystems has less than 10% of its capacity available for a duration of 5 minutes:

  - alert: HostOutOfDiskSpace
    expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: 'Host out of disk space (instance {{ $labels.instance }})'
      description: 'Disk is almost full (< 10% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'

CPU Usage

To alert on CPU usage, we can use the metrics available under node_cpu_seconds_total. In the previous part of this blog we already went into which specific metrics we can find there.

This alert takes the rate of idle CPU seconds, and multiplies this by 100 to get the average percentage of idle CPU cycles over the last 5 minutes. We average this by instance to include all CPU’s (cores) in this average otherwise we would end up with an average percentage for each CPU in the system.

The alert will fire when the average CPU usage of the system exceeds 80% for 5 minutes:

  - alert: HostHighCpuLoad
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: 'Host high CPU load (instance {{ $labels.instance }})'
      description: 'CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'

Predictive Alerting

Using the PromQL function predict_linear we can expand on the disk space alert mentioned earlier. predict_linear can predict the value of a certain time series X seconds from now. We can use this to predict when our disk is going to fill up, if the pattern follows a linear prediction model.

The following alert will trigger if the linear prediction algorithm, using disk usage patterns over the last hour, determines that the disk will fill up in the next four hours:

  - alert: DiskWillFillIn4Hours
    expr: predict_linear(node_filesystem_free_bytes[1h], 4 * 3600) < 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: 'Disk {{ $labels.device }} will fill up in the next 4 hours'
      description: |
        Based on the trend over the last hour, it looks like the disk {{ $labels.device }} on {{ $labels.mountpoint }}
        will fill up in the next 4 hours ({{ $value | humanize }}% space remaining)

Give Me More!

If you are interested in more examples of alert rules, you can find a very extensive collection at Awesome Prometheus alerts. You can find examples here for exporters we haven’t covered too, such as the Blackbox or MySQL exporter.

Syntax Checking Your Alert Rule Definitions

Prometheus comes with a tool that allows you to verify the syntax of your alert rules. This will come in handy for local development of rules or in CI/CD pipelines, to make sure that no broken syntax makes it to your production Prometheus platform.

You can invoke the tool by running promtool check rules /etc/prometheus/rules/alert-rules.yml

# promtool check rules /etc/prometheus/rules/alert-rules.yml
Checking /etc/prometheus/rules/alert-rules.yml
  SUCCESS: 5 rules found

Scraping Metrics From Alertmanager

Alertmanager has a built in metrics endpoint that exports metrics about how many alerts are firing, resolved or silenced. Now that we have all components running, we can add alertmanager as a target to our Prometheus server to start scraping these metrics.

On your Prometheus server, open /etc/prometheus/prometheus.yml with your favorite editor and add the following new job under the scrape_configs section (replace 192.168.0.10 with the IP of your alertmanager instance):

  - job_name: 'alertmanager'
    static_configs:
    - targets: ['192.168.0.10:9093']

Restart Prometheus, and check in the Prometheus web console if you can see the new Alertmanager section under Status > Targets. If all goes well, a query in the Prometheus web console for alertmanager_cluster_enabled should return one result with the value 1.

We can now continue with adding alert rules for Alertmanager itself:

  - alert: PrometheusNotConnectedToAlertmanager
    expr: prometheus_notifications_alertmanagers_discovered < 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: 'Prometheus not connected to alertmanager (instance {{ $labels.instance }})'
      description: 'Prometheus cannot connect the alertmanager\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'
  - alert: PrometheusAlertmanagerNotificationFailing
    expr: rate(alertmanager_notifications_failed_total[1m]) > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: 'Prometheus AlertManager notification failing (instance {{ $labels.instance }})'
      description: 'Alertmanager is failing to send notifications\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'

The first rule will fire when Alertmanager is no longer connected to Prometheus for over 5 minutes, the second rule will fire when Alertmanager fails to send out notification alerts. But how will we know about the alert, if notifications are failing? That’s where the next section comes in handy!

Alertmanager Dashboard Using Karma

The Alertmanager web console is useful for a basic overview of alerts and to manage silences, but it is not really suitable for use as a dashboard that gives us a tactical overview of our Prometheus monitoring platform.

For this, we will use Karma.

Karma offers a nice overview of active alerts, grouping of alerts by a certain label, silence management, alert achknowledgement and more.

We can install it on the same machine where Alertmanager is running using the following steps;

Start off by creating a seperate user and configuration folder for karma:

useradd -M -r -s /bin/false karma
mkdir /etc/karma
chown karma:karma /etc/karma

Then download the file and verify its checksum:

cd /tmp
wget https://github.com/prymitive/karma/releases/download/v0.78/karma-linux-amd64.tar.gz
wget -O - -q https://github.com/prymitive/karma/releases/download/v0.78/sha512sum.txt | grep linux-amd64 | shasum -c -

Make sure the last command returns karma-linux-amd64.tar.gz: OK again. Now unpack the file and move it into place:

tar xzf karma-linux-amd64.tar.gz
mv karma-linux-amd64 /usr/local/bin/karma
rm karma-linux-amd64.tar.gz

Create the file /etc/karma/karma.yml and add the following default configuration (replace the username and password):

alertmanager:
  interval: 1m
  servers:
    - name: alertmanager
      uri: http://localhost:9093
      timeout: 20s
authentication:
  basicAuth:
    users:
      - username: cartman
        password: secret

Set the proper permissions on the config file

chown karma:karma /etc/karma/karma/yml
chmod 640 /etc/karma/karma/yml

Create the file /etc/systemd/system/karma.service with the following content:

[Unit]
Description=Karma Alertmanager dashboard
Wants=network-online.target
After=network-online.target
After=alertmanager.service

[Service]
User=karma
Group=karma
Type=simple
WorkingDirectory=/etc/karma/
ExecStart=/usr/local/bin/karma \
    --config.file=/etc/karma/karma.yml

[Install]
WantedBy=multi-user.target

Activate and start the service with the following commands:

systemctl daemon-reload
systemctl start karma
systemctl enable karma

The command systemctl status karma should now indicate that karma is up and running:

You should be able to visit your new Karma dashboard now at http://<alertmanager server IP>:8080. Here’s what it looks like when we stop the node_exporter service again and wait for 5 minutes for the alert to fire:

If you want to explore all the possibilities and configuration options of Karma, then please see the documentation.

Conclusion

In this series we’ve installed Prometheus, the node exporter, and the Alertmanager. We’ve given a small introduction in PromQL and how to write Prometheus queries and alert rules, and used Grafana to graph metrics and Karma to offer an overview of triggered alerts.

If you want to explore further, check out the following resources:

Share

Leaseweb Cloud AWS EC2 support

As you might know, some of the products LeaseWeb include in its portfolio are Public and Private Cloud based on Apache CloudStack, which supports a full API. We, LeaseWeb, are very open about this, and we try to be as much involved and participative in the community and product development as possible. You might be familiar with this if you are a Private Cloud customer. In this article we target the current and former EC2 users, who probably have already tools built upon AWS CLI, by demonstrating how you can keep using them with LeaseWeb Private Cloud solutions.

Apache CloudStack has supported EC2 API for some time in the early days, but along the way, while EC2 API evolved, CloudStack’s support has somewhat stagnated. In fact, The AWS API component from CloudStack was recently detached from the main distribution as to simplify the maintenance of the code.

While this might sound like bad news, it’s not – at all. In the meantime, another project spun off, EC2Stack, and was embraced by Apache as well. This new stack supports the latest API (at the time of writing) and is much easier to maintain both in versatility as in codebase. The fact that it is written in Python opens up the audience for further contribution while at the same time allows for quick patching/upgrade without re-compiling.

So, at some point, I thought I could share with you how to quickly setup your AWS-compatible API so you can reuse your existing scripts. On to the details.

The AWS endpoint acts as an EC2 API provider, proxying requests to LeaseWeb API, which is an extension to the native CloudStack API. And since this API is available for Private Cloud customers, EC2Stack can be installed by the customer himself.

Following is an illustration of how this can be done. For the record, I’m using Ubuntu 14.04 as my desktop, and I’ll be setting up EC2stack against LeaseWeb’s Private Cloud in the Netherlands.

First step is to gather all information for EC2stack. Go to your LeaseWeb platform console, and obtain API keys for your user (sensitive information blurred):

apikeys-blurred

Note down the values for API Key and Secret Key (you should already know the concepts from AWS and/or LeaseWeb Private Cloud).

Now, install EC2Stack and configure it:

ntavares@mylaptop:~$ pip install ec2stack 
[…]
ntavares@mylaptop:~$ ec2stack-configure 
EC2Stack bind address [0.0.0.0]: 127.0.0.1 
EC2Stack bind port [5000]: 5000 
Cloudstack host [mgt.cs01.leaseweb.net]: csrp01nl.leaseweb.com 
Cloudstack port [443]: 443 
Cloudstack protocol [https]: https 
Cloudstack path [/client/api]: /client/api 
Cloudstack custom disk offering name []: dualcore
Cloudstack default zone name [Evoswitch]: CSRP01 
Do you wish to input instance type mappings? (Yes/No): Yes 
Insert the AWS EC2 instance type you wish to map: t1.micro 
Insert the name of the instance type you wish to map this to: Debian 7 amd64 5GB 
Do you wish to add more mappings? (Yes/No): No 
Do you wish to input resource type to resource id mappings for tag support? (Yes/No): No 
INFO  [alembic.migration] Context impl SQLiteImpl. 
INFO  [alembic.migration] Will assume non-transactional DDL. 

The value for the zone name will be different if your Private Cloud is not in the Netherlands POP. The rest of the values can be obtained from the platform console:

serviceoffering-blurred

template-blurred
You will probably have different (and more) mappings to do as you go, just re-run this command later on.

At this point, your EC2stack proxy should be able to talk to your Private Cloud, so we now need to instruct it to launch it to accept EC2 API calls for your user. For the time being, just run it on a separate shell:

ntavares@mylaptop:~$ ec2stack -d DEBUG 
 * Running on http://127.0.0.1:5000/ 
 * Restarting with reloader

And now register your user using the keys you collected from the first step:

ntavares@mylaptop:~$ ec2stack-register http://localhost:5000 H5xnjfJy82a7Q0TZA_8Sxs5U-MLVrGPZgBd1E-1HunrYOWBa0zTPAzfXlXGkr-p0FGY-9BDegAREvq0DGVEZoFjsT PYDwuKWXqdBCCGE8fO341F2-0tewm2mD01rqS1uSrG1n7DQ2ADrW42LVfLsW7SFfAy7OdJfpN51eBNrH1gBd1E 
Successfully Registered!

And that’s it, as far the API service is concerned. As you’d normally do with AWS CLI, you now need to “bind” the CLI to this new credentials:

ntavares@mylaptop:~$ aws configure 
AWS Access Key ID [****************yI2g]: H5xnjfJy82a7Q0TZA_8Sxs5U-MLVrGPZgBd1E-1HunrYOWBa0zTPAzfXlXGkr-p0FGY-9BDegAREvq0DGVEZoFjsT
AWS Secret Access Key [****************L4sw]: PYDwuKWXqdBCCGE8fO341F2-0tewm2mD01rqS1uSrG1n7DQ2ADrW42LVfLsW7SFfAy7OdJfpN51eBNrH1gBd1E  Default region name [CS113]: CSRP01 
Default output format

: text

And that’s it! You’re now ready to use AWS CLI as you’re used to:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 --output json ec2 describe-images | jq ' .Images[] | .Name ' 
"Ubuntu 12.04 i386 30GB" 
"Ubuntu 12.04 amd64 5GB" 
"Ubuntu 13.04 amd64 5GB" 
"CentOS 6 amd64 5GB" 
"Debian 6 amd64 5GB" 
"CentOS 7 amd64 20140822T1151" 
"Debian 7 64 10 20141001T1343" 
"Debian 6 i386 5GB" 
"Ubuntu 14.04 64bit with docker.io" 
"Ubuntu 12.04 amd64 30GB" 
"Debian 7 i386 5GB" 
"Ubuntu 14.04 amd64 20140822T1234" 
"Ubuntu 12.04 i386 5GB" 
"Ubuntu 13.04 i386 5GB" 
"CentOS 6 i386 5GB" 
"CentOS 6 amd64 20140822T1142" 
"Ubuntu 12.04 amd64 20140822T1247" 
"Debian 7 amd64 5GB"

Please note that I only used JSON output (and JQ to parse it) for summarising the results, as any other format wouldn’t fit on the page.

To create a VM with built-in SSH keys, you should create/setup your keypair in LeaseWeb Private Cloud, if you didn’t already. In the following example I’m generating a new one, but of course you could load your existing keys.

ssh-keypairs-blurred

You will want to copy paste the generated key (in Private Key) to a file and protect it. I saved mine in $HOME/.ssh/id_ntavares.csrp01.key .

ssh-keypairs2-blurred

This key will be used later to log into the created instances and extract the administrator password. Finally, instruct the AWS CLI to use this keypair when deploying your instances:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 run-instances \
 --instance-type dualcore \
 --image-id 7c123f01-9865-4312-a198-05e2db755e6a \
 --key-name ntavares-key 
INSTANCES	KVM	7c123f01-9865-4312-a198-05e2db755e6a	a0977df5-d25e-40cb-9f78-b3a551a9c571	dualcore	ntavares-key	2014-12-04T12:03:32+0100	10.42.1.129 
PLACEMENT	CSRP01 
STATE	16	running 

Note that the image-id is taken from the previous listing (the one I simplified with JQ).

Also note that although EC2stack is fairly new, and there are still some limitations to this EC2-CS bridge – see below for a mapping of supportedAPI calls. For instance, one that you can could run into at the time of writing this article (~2015) was the inability to deploy an instance if you’re using multiple Isolated networks (or multiple VPC). Amazon shares this concept as well, so a simple patch was necessary.

For this demo, we’re actually running in an environment with multiple isolated networks, so if you ran the above command, you’d get the following output:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 run-instances \
 --instance-type dualcore \
 --image-id 7c123f01-9865-4312-a198-05e2db755e6a \
 --key-name ntavares-key
A client error (InvalidRequest) occurred when calling the RunInstances operation: More than 1 default Isolated networks are found for account Acct[47504f6c-38bf-4198-8925-991a5f801a6b-rme]; please specify networkIds

In the meantime, LeaseWeb’s patch was merged, as many others, which both demonstrates the power of Open Source and the activity on this project.

Naturally, the basic maintenance tasks are there:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 describe-instances 
RESERVATIONS	None 
INSTANCES	KVM	7c123f01-9865-4312-a198-05e2db755e6a	a0977df5-d25e-40cb-9f78-b3a551a9c571	dualcore	ntavares-key	2014-12-04T12:03:32+0100	10.42.1.129	10.42.1.129 
PLACEMENT	CSRP01	default 
STATE	16	running

I’ve highlighted some information you’ll need now to login to the instance: the instance id, and IP address, respectively. You can login either with your ssh keypair:

[root@jump ~]# ssh -i $HOME/.ssh/id_ntavares.csrp01.key root@10.42.1.129 
Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 

[...] 
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent 
permitted by applicable law. 
root@a0977df5-d25e-40cb-9f78-b3a551a9c571:~#

If you need, you can also retrieve the password the same way you do with EC2:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 get-password-data --instance-id a0977df5-d25e-40cb-9f78-b3a551a9c571 
None dX5LPdKndjsZkUo19Z3/J3ag4TFNqjGh1OfRxtzyB+eRnRw7DLKRE62a6EgNAdfwfCnWrRa0oTE1umG91bWE6lJ5iBH1xWamw4vg4whfnT4EwB/tav6WNQWMPzr/yAbse7NZHzThhtXSsqXGZtwBNvp8ZgZILEcSy5ZMqtgLh8Q=

As it happens with EC2, password is returned encrypted, so you’ll need your key to display it:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 get-password-data --instance-id a0977df5-d25e-40cb-9f78-b3a551a9c571 | awk '{print $2}' > ~/tmp.1
ntavares@mylaptop:~$ openssl enc -base64 -in tmp.1 -out tmp.2 -d -A 
ntavares@mylaptop:~$ openssl rsautl -decrypt -in tmp.2 -text -inkey $HOME/.ssh/id_ntavares.csrp01.key 
ntavares@mylaptop:~$ cat tmp.3 ; echo 
hI5wueeur
ntavares@mylaptop:~$ rm -f tmp.{1,2,3} 
[root@jump ~]# sshpass -p hI5wueeur ssh root@10.42.1.129 
Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 

[...]
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent 
permitted by applicable law. 
Last login: Thu Dec  4 13:33:07 2014 from jump.rme.leasewebcloud.com 
root@a0977df5-d25e-40cb-9f78-b3a551a9c571:~#

The multiple isolated networks scenario

If you’re already running multiple isolated networks in your target platform (be either VPC-bound or not), you’ll need to pass argument –subnet-id to the run-instances command to specify which network to deploy the instance into; otherwise CloudStack will complain about not knowing in which network to deploy the instance. I believe this is due to the fact that Amazon doesn’t allow the use the Isolated Networking as freely as LeaseWeb – LeaseWeb delivers you the full flexibility at the platform console.

Since EC2stack does not support describe-network-acls (as of December 2014) in order to allow you to determine which Isolated networks you could use, the easiest way to determine them is to go to the platform console and copy & paste the Network ID of the network you’re interested in:

Then you could use –subnet-id:

ntavares@mylaptop:~$ aws --endpoint=http://127.0.0.1:5000 ec2 run-instances \
 --instance-type dualcore \
 --image-id 7c123f01-9865-4312-a198-05e2db755e6a \
 --key-name ntavares-key \
 --subnet-id 5069abd3-5cf9-4511-a5a3-2201fb7070f8
PLACEMENT	CSRP01 
STATE	16	running 

I hope I demonstrated a bit of what can be done in regards to compatible EC2 API. Other funtions are avaiable for more complex tasks although, as wrote earlier, EC2stack is quite new, for which you might need community assistance if you cannot develop the fix on your own. At LeaseWeb we are very interested to know your use cases, so feel free to drop us a note.

Share

Windows compatibility broken in Linux kernel (fixed)

linux_steam_broken

I’m running Ubuntu Linux and today I found that Steam did not work anymore (nor did some other Windows applications). Steam could not find my games and the store was not working. I remember installing some security updates. It turns out that Wine crashes on Ubuntu 14.04 LTS with the latest kernel update (3.13.0.59). There are several workarounds:

  1. Run “wineserver -p” in the terminal before starting Windows applications (like “steam”).
  2. Revert the kernel update with “sudo apt-get remove linux-image-3.13.0-59-generic” and “sudo update-grub”.
  3. Upgrade the kernel with “sudo apt-get install linux-image-generic-lts-vivid”.

Bug reports on the web

There are multiple places where people are discussing this bug:

I hope this helps you all… Happy gaming on Linux!

Update: A new kernel has been released, which fixes the problem: linux-image-3.13.0-61-generic (automatically installed)

 

Share

Heka monolog decoder

This post is about how to use heka to give your symfony 2 application logs the care they deserve.

Application logs are very important for the quality of the product or service you are offering.

They help you find out what went wrong so you can explain and fix a bug that was reported recently. Or maybe to gather statistics to see how often a certain feature is used. For example how many bare metal reinstallation requests were issued last month and how many of those failed. Valuable information that you could use to decide what feature you are going to work on next.

At LeaseWeb we use quite a lot of php and Seldaek’s monolog is our logging library of choice. Recently we open sourced a heka decoder on github here. For you who do not know heka yet, check out their documentation.

Heka is an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as …

Heka runs as a system daemon just like logstash or fluentd. Heka is written in go and comes with an easy to use plugin system based on lua. It has almost no dependencies and is lightweight. James Turnbull has written a nice article on how to get started with heka.

send symfony 2 application logs to elastic search

How better to explain with an example.

Lets say you have a symfony 2 application and you want the application logs to be sent to an Elastic Search platform.

On a debian based os you can download one of the heka debian packages from github.

    $ wget https://github.com/mozilla-services/heka/releases/download/v0.9.2/heka_0.9.2_i386.deb
    $ sudo dpkg -i heka_0.9.2_i386.deb

To configure heka you are required to edit the configuration file located at /etc/hekad.toml.

    $ vim /etc/hekad.toml

Take your time to read the excellent documentation on heka. There are many ways of using heka but we will use it as a forwarder:

  1. Define an input, where messages come from.
  2. Tell heka how it should decode the monolog log line into a heka message
  3. Tell heka how to encode the message so Elastic Search will understand it
  4. Define an output, where should the messages be sent to.

First we define the input:

    [Symfony2MonologFileInput]
    type = "LogstreamerInput"
    log_directory = "/var/www/app/logs"
    file_match = 'prod\.log'
    decoder = "Symfony2MonologDecoder"

Adjust `log_directory` and `file_match` according to your setup. As you can see we alread told heka to use the `Symfony2MonologDecoder` to we will define that one next:

    [Symfony2MonologDecoder]
    type = "SandboxDecoder"
    filename = "/etc/symfony2_decoder.lua"

Change the `filename` with the path where you placed the lua script on your system.

Now we have defined the input we can tell heka where to output messages to:

    [ESJsonEncoder]
    index = "%{Hostname}"
    es_index_from_timestamp = true
    type_name = "%{Type}"

    [ElasticSearchOutput]
    message_matcher = "TRUE"
    server = "http://192.168.100.1:9200"
    flush_interval = 5000
    flush_count = 10
    encoder = "ESJsonEncoder"

In the above example we assume that your Elastic Search server is running at 192.168.100.1.

And thats it.

A simple log line in app/logs/prod.log:

    [2015-06-03 22:08:02] app.INFO: Dit is een test {"bareMetalId":123,"os":"centos"} {"token":"556f5ea25f6af"}

Is now sent to Elastic Search. You should now be able to query your Elastic Search for log messages, assuming the hostname of your server running symfony 2 is myapi:

    $ curl http://192.168.100.1:9200/myapi/_search | python -mjson.tool
    {
        "_shards": {
            "failed": 0,
            "successful": 5,
            "total": 5
        },
        "hits": {
            "hits": [
                {
                    "_id": "ZIV7ryZrQRmXDiB6thY_yQ",
                    "_index": "myapi",
                    "_score": 1.0,
                    "_source": {
                        "EnvVersion": "",
                        "Hostname": "myapi",
                        "Logger": "Symfony2MonologFileInput",
                        "Payload": "Dit is een test",
                        "Pid": 0,
                        "Severity": 7,
                        "Timestamp": "2015-06-03T20:08:02.000Z",
                        "Type": "logfile",
                        "Uuid": "344e7cae-6ab7-4fb2-a770-d2cbad6653c3",
                        "channel": "app",
                        "levelname": "INFO",
                        "bareMetalId": 123,
                        "os": "centos",
                        "token": "556f5ea25f6af"
                    },
                    "_type": "logfile"
                },
        // ...
        }
    }

What is important to notice is that the keys token, bareMetalId and os in the monolog log line end up in Elastic Search as an indexable fields. From your php code you can add this extra information to your monolog messages by supplying an associative array as a second argument to the default monolog log functions:

    <?php

    $logger = $this->logger;
    $logger->info('The server was reinstalled', array('bareMetalId' => 123, 'os' => 'centos'));

Happy logging!

Share

Meet the LeaseWeb Development Team

working_at_leaseweb

What is it like to work at LeaseWeb? In this video, our developers share their experiences. Click the image to see the video.

Would you like to join them? There’s always room for more talent. Whether you’re a datacenter guru or a programming prodigy, there’s always a place for you at LeaseWeb. Take a look at our vacancies and find out where you fit in!

About LeaseWeb:

LeaseWeb – part of the OCOM Group – is one of the world’s largest hosting brands offering a broad portfolio of public cloud servers, private clouds, bare metal servers, colocation and CDN services, all backed by a low-latency, blended global network with over 5.0 Tbps of capacity. Currently, LeaseWeb owns and operates approximately 65,000 servers which are hosted in our data centers across Asia, Europe and the U.S.

For more information visit:
http://www.leaseweb.com

Share