Understanding and Interpreting CPU Steal Time on Virtual Machines

Virtual machines report on different types of usage metrics, such as server load, memory usage, and steal time. Customers often ask about steal time – what is it, and why is it reported on their virtual machines? Read on as we explain how steal time works to better understand what it means for your virtual machine. 

What is Steal Time? 

Steal time is the percentage of time the virtual machine process is waiting on the physical CPU for its CPU time. You can monitor processes and resource usage by running the “top” command on your Linux server. Among usage metrics, is steal time is labeled as ‘st’.

CPU in Virtual Environments

In cloud environments, the hypervisor acts as the interface between the physical server and its virtualized environment. The hypervisor kernel manages all these tasks by scheduling the running processes to the physical cores of the server. Processes such as virtual machines, networking operations, and storage I/O requests are given with some CPU time to process jobs. CPU time is allocated between these processes, which shifts priorities and creates contention between these processes over the physical cores.

%Idle Time

Steal time can also be visible on virtual machines alongside idle time. Idle time means that there is CPU time allocated by the hypervisor, but the virtual machine did not use that time. In this case, we can assume there was no effect on the performance at all.

When the idle time percentage is 0 and steal time is present, we can assume that processes on the virtual machine are processed with a delay.

Multi-Tenant Cloud

Leaseweb cloud platforms consist of single-tenant and multi-tenant environments. Leaseweb CloudStack products allow you to develop and run a multi-tenant environment, enabling different kinds of users to run their cloud infrastructures at a lower cost. Along with not overselling virtual cores on our premium CloudStack platforms, we also do not pin virtual machines to CPU cores. This allows the hypervisor to allocate CPU time from all the server’s physical cores to any of its active processes.

Theoretically speaking, if the virtual machine has immediate access to its assigned cores 100% of the time, there would be no steal time visible. However, hypervisors are running many different tasks and are continuously performing actions such as rescheduling tasks for efficiency and processing received data from other systems. All these processes require CPU time from the hypervisor’s CPU, resulting in delayed access to the physical cores and adding steal time to the virtual machine.

Analyze Service Performance

A small amount of steal time is often unavoidable in modern hosting environments, particularly when running on shared cloud hosting. The steal time virtual machines experience is not always visible from outside the virtualized operating system.

If you see a constant steal time registered by the virtual machine, try finding a correlation with the tasks you are executing. More importantly, how does this steal time result in performance loss? Are you noticing any loss in performance on your applications? If so, try measuring output to discover latency in the whole flow of your application in accordance with steal time. Keep your hosting provider informed in case you do see an experience impact on your application. In many situations, they can find a more suitable environment by moving your virtual machine to a different hypervisor.

Share

Automate Your Server Platform Using The Leaseweb API

This blog is about using the Leaseweb API to automate the management of your dedicated servers running with Leaseweb- including examples of how to deploy your dedicated servers without even logging in once! While this article focuses solely on dedicated servers, we also have many API calls for specifically managing your cloud(s) at Leaseweb.

Introduction

Did you know Leaseweb provides access to more than 150 backend features to manage and control your platform? We develop our systems with an ‘API-first’ philosophy, meaning that anything you do via our Customer Portal can also be done via the API (or more!). Many of these functions will help you when deploying a new dedicated server.

Getting Started

Let’s started by exploring what we can do with the API. Full disclaimer: it doesn’t matter if you are a newbie in the world of API calls or an expert – we have documented everything clearly and with plenty of examples to make your life easier. Detailed API information and examples of five different programming languages (including Ruby, PHP and Python) can be found on the Leaseweb developer website. We have also documented how to access the API on our developer website.

Once you have created your API key you can begin exploring different possibilities. Some options we provide that programmatically control your dedicated server platform via our API are:

  • Server management: fully manage your dedicated servers, including OS install, power cycle, hardware scan, etc.
  • Private Network: add or remove servers from the Private Network.
  • Floating IPs: control which servers the Floating IPs are routed to.
  • IP Management: see which IP is assigned to which server, null route IPs, or set reverse DNS entries.

Example

Assume I have a server farm that consists of several dedicated servers, each with its own Internet uplink public IP. The servers are interconnected on the backend using Private Networking, while the Floating IPs are used to quickly redirect traffic to a different server for disaster recovery purposes.

What are some of the basic API calls I could use to control my farm? Let me give you some examples.

(Re)Installation of an Operating System

Using a POST operation, I can request a reinstall of the OS:


I also need to know the different parameters (payload) like serverID and OperatingSystemID, which can be retrieved via different API calls. In the payload area I can specify what the partition layout should be so that the OS will be installed to my exact specifications.

Custom Operating System Image

Alternatively, it is possible to install your own custom installation image using PXE boot while using the API to set the DHCP option. This will include it in the DHCP lease for the server.


Once the server boots, it will get a DHCP lease and will retrieve the installation image from the URL specified in the DHCP option. For more information, check out our Knowledge Base article that explains this feature in more detail.

Other Features

Some other nice features to control the server farm include possibilities to:

  •  Power cycle a server

This will turn the power off and on for the PDU that the server is connected to.

  • Show which IPs (including Floating IPs) are assigned to a server or are null-routed

  • Perform a hardware scan (reboots the server) and get hardware inventory of a server


  • Inspect how much data traffic a server is doing, both up and downstream


Private Networking

Servers in the same metro data center area (which in most cases means Leaseweb data centers that are relatively close to each other) have the possibility to be connected to the Leaseweb Private Network. This backend network provides a secure and unmetered layer-2 connection between all servers with port speeds of up to 10Gbps.

I am able to check if a server is already connected to my assigned Private Network by using the Inspect Private Network operation. This gives me a list (array) of server IDs that are connected.


If needed, I can add or remove a server from the PN or change the port speed.

Floating IPs

Floating IPs provide the possibility to dynamically reroute traffic to a different server (anchor IP). It is possible automate this process using the API. Assuming a Floating IP definition has already been defined and traffic is routed to my first server AnchorIP, I can use a PUT request to change the AnchorIP to my 2nd server.


A good example would be to embed this in a monitoring system. Once the monitoring system detects a ‘server unavailable’, it can automatically redirect traffic to the standby server using this API call.

Conclusion

While this blog is too short to sum up all the possibilities and features that are possible, it should give you some idea of what a powerful mechanism the API can be to automate and manage your environment.

Share

Set up Private DNS-over-TLS/HTTPS

Domain Name System (DNS) is a crucial part of Internet infrastructure. It is responsible for translating a human-readable, memorizable domain (like leaseweb.com) into a numeric IP address (such as 89.255.251.130).

In order to translate a domain into an IP address, your device sends a DNS request to a special DNS server called a resolver (which is most likely managed by your Internet provider). The DNS requests are sent in plain text so anyone who has access to your traffic stream can see which domains you visit.

There are two recent Internet standards that have been designed to solve the DNS privacy issue:

  • DNS over TLS (DoT):
  • DNS over HTTPS (DoH)

Both of them provide secure and encrypted connections to a DNS server.

DoT/DoH feature compatibility matrix:

Firefox Chrome Android 9+ iOS 14+
DoT
DoH

iOS 14 will be released later this year.

In this article, we will setup a private DoH and DoT recursor using pihole in a docker container, and dnsdist as a DNS frontend with Letsencrypt SSL certificates. As a bonus, our DNS server will block tracking and malware while resolving domains for us.

Installation

In this example we use Ubuntu 20.04 with docker and docker-compose installed, but you can choose your favorite distro (you might need to adapt a bit).

You may also need to disable systemd-resolved because it occupies port 53 of the server:

# Check which DNS resolvers your server is using:
systemd-resolve --status
# look for "DNS servers" field in output

# Stop systemd-resolved
systemctl stop systemd-resolved

# Then mask it to prevent from further starting
systemctl mask systemd-resolved

# Delete the symlink systemd-resolved used to manage
rm /etc/resolv.conf

# Create /etc/resolv.conf as a regular file with nameservers you've been using:
cat <<EOF > /etc/resolv.conf
nameserver <ip of the first DNS resolver>
nameserver <ip of the second DNS resolver>
EOF

Install dnsdist and certbot (for letsencrypt certificates):

# Install dnsdist repo
echo "deb [arch=amd64] http://repo.powerdns.com/ubuntu focal-dnsdist-15 main" > /etc/apt/sources.list.d/pdns.list
cat <<EOF > /etc/apt/preferences.d/dnsdist
Package: dnsdist*
Pin: origin repo.powerdns.com
Pin-Priority: 600
EOF
curl https://repo.powerdns.com/FD380FBB-pub.asc | apt-key add -

apt update
apt install dnsdist certbot

Pihole

Now we create our docker-compose project:

mkdir ~/pihole
touch ~/pihole/docker-compose.yml

The contents of docker-compose.yml file:

version: '3'
services:
  pihole:
    container_name: pihole
    image: 'pihole/pihole:latest'
    ports:
    # The DNS server will listen on localhost only, the ports 5300 tcp/udp.
    # So the queries from the Internet won't be able to reach pihole directly.
    # The admin web interface, however, will be reachable from the Internet.
      - '127.0.1.53:5300:53/tcp'
      - '127.0.1.53:5300:53/udp'
      - '8081:80/tcp'
    environment:
      TZ: Europe/Amsterdam
      VIRTUAL_HOST: dns.example.com # domain name we'll use for our DNS server
      WEBPASSWORD: super_secret # Pihole admin password
    volumes:
      - './etc-pihole/:/etc/pihole/'
      - './etc-dnsmasq.d/:/etc/dnsmasq.d/'
    restart: unless-stopped

Start the container:

docker-compose up -d

After the container is fully started (it may take several minutes) check that it is able to resolve domain names:

dig +short @127.0.1.53 -p5300 one.one.one.one
# Excpected output
# 1.0.0.1
# 1.1.1.1

Letsencrypt Configuration

Issue the certificate for our dns.example.com domain:

certbot certonly

Follow the instructions on the screen (i.e. select the proper authentication method suitable for you, and fill the domain name).

After the certificate is issued it can be found by the following paths:

  • /etc/letsencrypt/live/dns.example.com/fullchain.pem – certificate chain
  • /etc/letsencrypt/live/dns.example.com/privkey.pem – private key

By default only the root user can read certificates and keys. Dnsdist, however, is running as user and group _dnsdist, so permissions need to be adjusted:

chgrp _dnsdist /etc/letsencrypt/live/dns.example.com/{fullchain.pem,privkey.pem}
chmod g+r /etc/letsencrypt/live/dns.example.com/{fullchain.pem,privkey.pem}

# We should also make archive and live directories readable.
# That will not expose the keys since the private key isn't world-readable
chmod 755 /etc/letsencrypt/{live,archive}

The certificates are periodically renewed by Certbot, so dnsdist should be restarted after that happens since it is not able to detect the new certificate. In order to do so, we put a so-called deploy script into /etc/letsencrypt/renewal-hooks/deploy directory:

mkdir -p /etc/letsencrypt/renewal-hooks/deploy
cat <<EOF > /etc/letsencrypt/renewal-hooks/deploy/restart-dnsdist.sh
#!/bin/sh
systemctl restart dnsdist
EOF
chmod +x /etc/letsencrypt/renewal-hooks/deploy/restart-dnsdist.sh

Dnsdist Configuration

Create dnsdist configuration file /etc/dnsdist/dnsdist.conf with the following content:

addACL('0.0.0.0/0')

-- path for certs and listen address for DoT ipv4,
-- by default listens on port 853.
-- Set X(int) for tcp fast open queue size.
addTLSLocal("0.0.0.0", "/etc/letsencrypt/live/dns.example.com/fullchain.pem", "/etc/letsencrypt/live/dns.example.com/privkey.pem", { doTCP=true, reusePort=true, tcpFastOpenSize=64 })

-- path for certs and listen address for DoH ipv4,
-- by default listens on port 443.
-- Set X(int) for tcp fast open queue size.
-- 
-- In this example we listen directly on port 443. However, since the DoH queries are simple HTTPS requests, the server can be hidden behind Nginx or Haproxy.
addDOHLocal("0.0.0.0", "/etc/letsencrypt/live/dns.example.com/fullchain.pem", "/etc/letsencrypt/live/dns.example.com/privkey.pem", "/dns-query", { doTCP=true, reusePort=true, tcpFastOpenSize=64 })

-- set X(int) number of queries to be allowed per second from a IP
addAction(MaxQPSIPRule(50), DropAction())

--  drop ANY queries sent over udp
addAction(AndRule({QTypeRule(DNSQType.ANY), TCPRule(false)}), DropAction())

-- set X number of entries to be in dnsdist cache by default
-- memory will be preallocated based on the X number
pc = newPacketCache(10000, {maxTTL=86400})
getPool(""):setCache(pc)

-- server policy to choose the downstream servers for recursion
setServerPolicy(leastOutstanding)

-- Here we define our backend, the pihole dns server
newServer({address="127.0.1.53:5300", name="127.0.1.53:5300"})

setMaxTCPConnectionsPerClient(1000)    -- set X(int) for number of tcp connections from a single client. Useful for rate limiting the concurrent connections.
setMaxTCPQueriesPerConnection(100)    -- set X(int) , similiar to addAction(MaxQPSIPRule(X), DropAction())

Checking if DoH and DoT Works

Check if DoH works using curl with doh-url flag:

curl --doh-url https://dns.example.com/dns-query https://leaseweb.com/

Check if DoT works using kdig program from the knot-dnsutils package:

apt install knot-dnsutils

kdig -d @dns.example.com +tls-ca leaseweb.com

Setting up Private DNS on Android

Currently only Android 9+ natively supports encrypted DNS queries by using DNS-over-TLS technology.

In order to use it go to: Settings -> Connections -> More connection settings -> Private DNS -> Private DNS provider hostname -> dns.example.com

Conclusion

In this article we’ve set up our own DNS resolving server with the following features:

  • Automatic TLS certificates using Letsencrypt.
  • Supports both modern encrypted protocols: DNS over TLS, and DNS over HTTPS.
  • Implements rate-limit of incoming queries to prevent abuse.
  • Automatically updated blacklist of malware, ad, and tracking domains.
  • Easily upgradeable by simply pulling a new version of Docker image.
Share

Using Correlation IDs in API Calls

Over the years, the IT industry has moved from a single domain, monolithic architecture to a microservice architecture. In a microservice architecture, complex processes are split into smaller and simpler sub-processes. While this kind of architecture has many benefits, there are also some downsides – for example, if you send one request to a Leaseweb API, it ends up in multiple requests in other backend systems [FIGURE 1]. How do you keep track of requests and responses processed by multiple systems? This is where Correlation IDs come into play.

[FIGURE 1: Example request/response flow]

Using a Correlation ID

A Correlation ID is a unique, randomly generated identifier value that is added to every request and response. In a microservice architecture, the initial Correlation ID is passed to your sub-processes. If a sub-system also makes sub-requests, it will also pass the Correlation ID to those systems.

How you pass the Correlation ID to other systems depends on your architecture. At Leaseweb we are using REST APIs a lot, with HTTP headers to pass on the Correlation ID. As a rule, we assign a Correlation ID as soon as possible, and always use a Correlation ID if it is passed on. Our public API only accepts Correlation IDs from internally trusted clients. For any other client (such as an employee or customer API clients) a new Correlation ID is generated for the request.

Real Value of Correlation IDs

The real value of Correlation IDs is realized when you also log the Correlation IDs. Debugging or tracing requests becomes much easier, as you can search all of your logs for the same Correlation ID. Combined with central logging solutions (such as the ELK stack), searching logs becomes even easier and can be done by non-technical colleagues. Providing tools to your colleagues to troubleshoot issues allows them to have more responsibility and gives you more time to work on more technical projects.

We mainly use Correlation IDs at Leaseweb for debugging purposes. When an error occurs, we provide the Correlation ID to the client/customer. If users provide the Correlation ID when submitting a support ticket, we can visualize the entire process needed to fulfil the client’s initial intent. This has significantly improved the time it takes us to fix bugs.

[FIGURE 2: Example of one Correlation ID with multiple requests]

Debugging issues is a time-consuming process if Correlation IDs are not used. When your environment scales, you will need to find solutions to group transactions happening in your systems. By using a Correlation ID, you can easily group requests and events in your systems, allowing you to spend more time fixing the problem and less time trying to find it.

Practical examples on how to implement Correlation IDs

The following examples use Symfony, a popular web application framework. These concepts can also be applied to any other framework, such as Laravel, Django, Flask or Ruby on Rails.

If you are unfamiliar with the concept of Service Containers and Dependency Injection, we recommend reading the excellent Symfony documentation about it here: https://symfony.com/doc/current/service_container.html

Using Monolog to append Correlation IDs to your application logs

When processing a HTTP request your application often logs some information – such as when an error occurred, or an important change made in your system that you want to keep track of. When using the Monolog logging library in PHP (https://seldaek.github.io/monolog/), you can use the concept of “Processors” (read more about that here on symfony.com).

One way to do this is by creating a Monolog Processor class:

<?php

namespace App\Monolog\Processor;

use Symfony\Component\HttpFoundation\RequestStack;

class CorrelationIdProcessor
{
    protected $requestStack;

    public function __construct(RequestStack $requestStack)
    { 

       $this->requestStack = $requestStack;

    }
 
    public function processRecord(array $record)
    {
        $request = $this->requestStack->getCurrentRequest();

        if (!$request) {
            return;
        }

        $correlationId = $request->headers->get(‘X-My-Correlation-ID');

        if (empty($correlationId)) {
             return;
        }

        // If we have a correlation id include it in every monolog line
        $record['extra']['correlation_id'] = $correlationId;
 
        return $record;
    }
}

Then register this class on the service container as a monolog processor in services.yml:

# app/config/services.yml

services:
  App\Monolog\Processor\CorrelationIdProcessor:
    arguments: ["@request_stack"]
    tags:
      - name: monolog.processor
        method: processRecord

Now, every time you log something in your application with Monolog:

$this->logger->info('shopping_cart_emptied', [‘cart_id’ => 123]);

You will see the Correlation ID of the HTTP Request in your log files:

$ grep ‘shopping_cart_emptied’ var/logs/prod.log

[2020-07-03 12:14:45] app.INFO: shopping_cart_emptied {“cart_id”: 123} {"correlation_id":"d135d5f1-3dd0-45fa-8f26-55d8d6a44876"}

You can utilize the same pattern to log the name of the user that is currently logged in, the remote IP address of the API client, or anything else that makes troubleshooting faster for you.

Using Guzzle to append Correlation IDs when making sub-requests

If your API makes API calls to other microservices (and you use Guzzle to do this) you can make use of Handlers and Middleware.

Some teams at Leaseweb depend on many downstream microservices, and can therefore have multiple guzzle clients as services on the service container. While each Guzzle client is configured with its own base URL and/or authentication, it is possible for all of the Guzzle clients to share the same HandlerStack.

First, create the middleware:

<?php

namespace App\Guzzle\Middleware;

use Symfony\Component\HttpFoundation\RequestStack;
use Psr\Http\Message\RequestInterface;

class CorrelationIdMiddleware
{
    protected $requestStack;
 
    public function __construct(RequestStack $requestStack)
    {
        $this->requestStack = $requestStack;
    }

    public function __invoke(callable $handler)
    {
        return function (RequestInterface $request, array $options = []) use ($handler) {
            $request = $this->requestStack->getCurrentRequest();

            if (!$request) {
                return $handler($request, $options);
            }

            $correlationId = $request->headers->get(‘X-My-Correlation-ID');

            if (empty($correlationId)) {
                 return $handler($request, $options);
            } 
 
            $request = $request->withHeader(‘X-My-Correlation-ID’, $correlationId);
 
            return $handler($request, $options);
        };
    }
}

Define this middleware as service on the service container and create a HandlerStack:

# app/config/services.yml

services:
  correlation_id_middleware:
    class: App\Guzzle\Middleware:
    arguments: ["@request_stack"]

  correlation_id_handler_stack:
    class: GuzzleHttp\HandlerStack
    factory: ['GuzzleHttp\HandlerStack', 'create']
    calls:
      - [push, ["@correlation_id_middleware", "correlation_id_forwarder"]]

With these two services defined, you can now configure all your Guzzle clients using the HandlerStack so that the Correlation ID of the current HTTP request is forwarded to downstream HTTP requests:

# app/config/services.yml

services:
  my_downstream_api:
    class:
    arguments:
      - base_uri: https://my-downstream-api.example.com
        handler: "@correlation_id_handler_stack”

Now every API call that you make to https://my-downstream-api.example.com will include the HTTP request header ‘X-My-Correlation-ID’ and have the same value as the Correlation ID of the current HTTP request. You can also apply the same Monolog and Guzzle tricks described here to the downstream API.

Expose Correlation IDs in error responses

The missing link between these processes is to now expose your Correlation IDs to your users so they can also log them or use them in support cases they report to your organization.

Symfony makes this easy using Event Listeners. You can define Event Listeners in Symfony to pre-process HTTP requests as well as to post-process HTTP Responses just before they are returned by Symfony to the API caller. In this example, we will create a HTTP Response listener and add the Correlation ID of the current HTTP request as a HTTP Header in the HTTP Response.

First, we create a service on the Service Container:

<?php
 
namespace App\Listener;
 
use Symfony\Component\HttpFoundation\RequestStack;
use Symfony\Component\HttpKernel\Event\FilterResponseEvent;

class CorrelationIdResponseListener
{
    protected $requestStack;
 
    public function __construct(RequestStack $requestStack)
    {
        $this->requestStack = $requestStack;
    }

    public function onKernelResponse(FilterResponseEvent $event)
    {
        $request = $this->requestStack->getCurrentRequest();

        if (!$request) {
            return;
        }

        $correlationId = $request->headers->get(‘X-My-Correlation-ID');

        if (empty($correlationId)) {
             return;
        }

        $event->getResponse()->headers->set(‘X-My-Correlation-ID’, $correlationId);
    }
}

Now configure it as a Symfony Event Listener:

# app/config/services.yml

services:
  correlation_id_response_listener:
    class: App\Listener\CorrelationIdResponseListener
    arguments: ["@request_stack"]
    tags:
      - { name: kernel.event_listener, event: kernel.response, method: onKernelResponse }

Every response that is generated by your Symfony application will now include a X-My-Correlation-ID HTTP response header with the same Correlation ID as the HTTP request.

The Value of Correlation IDs

Using Correlation IDs throughout your whole stack gives you more insight into all (sub)requests during a transaction. Using the right tools allows others to debug issues, giving your developers more time to work on new awesome features.

Implementing Correlation IDs isn’t hard to do, and can be achieved quickly depending on your software stack. At Leaseweb, the use of Correlation IDs has saved us hours of time while debugging issues on numerous occasions.

Technical Careers at Leaseweb

We are searching for the next generation of engineers and developers to help us build infrastructure to automate our global hosting services! If you are interested in finding out more, check out our Careers at Leaseweb.

Share

Measuring IP Connectivity with RIPE Atlas

One of the most important but often overlooked criteria when choosing a hosting provider is the performance of its network. Poor network performance for any kind of online service leads to low customer satisfaction. What is the best way to measure network performance that most accurately simulates a customers experience with a network? We found this question to be an interesting idea to explore during a Leaseweb hackathon!

RIPE Atlas

RIPE Atlas is the RIPE NCC’s main Internet data collection system. Its a global network of devices, called probes and anchors, that actively measure Internet connectivity. Anyone can access this data via Internet traffic maps, streaming data visualizations, and an API. RIPE Atlas users can also perform customized measurements to gain valuable data about their own networks.

Due to the size and reach of the Atlas project, it’s one of the most important internet measurement initiatives internationally. What’s great is that the project is run by RIPE, but driven by the global community of RIPE members (and non-members!) who contribute some of their resources to help the project. This improves visibility into the inner-workings of the global Internet. It’s used by Internet professionals all over the world to improve quality of their network, debug issues, and learn. Leaseweb contributes to the RIPE Atlas by hosting 7 ‘anchors’ in various data centers all over the globe.

Since Leaseweb already contributes to RIPE Atlas with these anchors, it’s an obvious choice as a source of random probes to be used against those anchors and compared with other infrastructure providers’ anchors. (By the way, if you would like to do your own measurements and contribute to the RIPE Atlas project at the same time, you can request a probe for your home or office right here!). You can read more on the structure of the Atlas network (and how probes, anchors and the RIPE backend work together) in various posts on the RIPE labs pages.

The main elements needed to use RIPE Atlas are measurements, sources and targets. These elements all have their own function to facilitate experiments.

Getting The Data

Triggering one-off measurements and fetching the results of those measurements can be done using the RIPE Atlas API. RIPE-NCC also developed a wrapper around the RIPE Atlas API to allow anyone to communicate with the RIPE Atlas API using Python. It is maintained by RIPE Atlas developers and is therefore the best choice for consuming their API. Using it requires an API key. Wrapper is open-source and available on GitHub.

Installation is very simple using pip:

$ pip install ripe.atlas.cousteau

Included classes in the Python script:

from ripe.atlas.cousteau import (
Ping,
Traceroute,
AtlasSource,
AtlasCreateRequest,
AtlasLatestRequest
)

Measurement Types

RIPE Atlas offers several measurement types – Ping, Traceroute, DNS, HTTP, SSL, NTP, and WiFi. Creating a measurement allows you to specify a type of test (or ‘experiment’) to perform. Typically, these have something to do with latency for a service, but there are also options to check things like resolving a domain name and checking DNS propagation. These measurements can be one-off or recurring. For our Hackathon project we used one-off measurements.

Sources

Probes defined as sources are used to trigger the measurement. They can be defined explicitly or taken from a pool (area, country, prefix or AS number). Below, the defined source takes 50 random probes from Europe to be used as a source of measurement.

source = AtlasSource(
    type="area",
    value= "North-Central",
    requested=50
)

Targets

Targeted IPs that the measurements are run against. In this example, ping is run against one of Leaseweb’s anchors.

pingLSW = Ping(
    af=4,
    target="5.79.112.97", 
    description="Ping nl-haa-as60781.anchors.atlas.ripe.net"
)

Create Request

Defining a request combines all of the elements together: measurements and sources. It also requires a start time, API key and – in this case – a one-off flag.

atlas_request = AtlasCreateRequest(
    start_time=datetime.utcnow(),
    key=ATLAS_API_KEY,
    measurements=[pingLSW, pingOVH, pingAzure, pingAWS, pingUni],
    sources=,
    is_oneoff=True
    )

To summarize what this request will do:

  • We specify a number of ping tests to various endpoints
  • We specify where we want to have those request come from and how many we want
  • …and we bundle those tests into a single one-off request.

Calling RIPE Atlas API now is simple

response = atlas_request.create()

After calling this function, Atlas will launch tests towards 50 random probes in the area we designated, and will store the results.

Returned values, stored in response, are measurement counts. In this case, there are 5 values as there are 5 measurements defined. The actual results have to be retrieved separately, so the next step is fetching the measurement data. Here, class AtlasLatestRequest is used:

results = AtlasLatestRequest(msm_id=measurement_id).create()

The results variable now has stored all of the measurement details needed to calculate and compare latencies. It’s clear to see how powerful Atlas is. In a few simple lines we’ve generated latency information from 50 end points to multiple targets!

Visualizing Data

Visualizing data that was fetched from RIPE Atlas API was done with pygal Python library that supports various chart types and styles. For this Hackathon project pygal.Bar() was used to draw out the comparison results. (Pygal usage is out of the scope of this blog post). Two charts below show the data from measurements taken from Europe and from Russia.

Conclusion and Going Forward

This Hackathon project showed the basic features of RIPE Atlas and what it can accomplish.  RIPE Atlas also maintains a parsing library name Sagan, available in GitHub, that handles format changes in measurement results and always returns a native Python object.

RIPE Atlas has a huge amount of functionality and can be easily used in your own experiments and measurements. Remember to always use the Atlas infrastructure responsibly.

Other tools for using permanent measurementsstreaming data, and building dashboards are RIPE Prometheus exporter, working as a metric exporter of RIPE Atlas measurement results that exports collected metrics to Prometheus. Grafana is common tool that works well with Prometheus to create dashboards with useful metrics, gathered from RIPE Atlas measurements.  

Comments? Questions? Other ways to use RIPE Atlas and receive measurements? I’d love to hear your feedback! 

Share