GoAccess is a top for Nginx or Apache

goaccess-dashboard

GoAccess is an open source real-time web log analyser and interactive viewer that runs in a terminal on Linux systems. The name suggest that is written in the Go language, but it actually is not (it is written in C).

Monitoring

Effectively it is like the “top” command, but instead of showing processes it is giving you insight in the traffic on your webserver. The tool provides all kinds of top lists useful for spotting irregularities. Great features include the listing of IP addresses by percentage of hits and showing the status codes of the served responses. Typically you would use this tool if your webserver is reporting high load or high bandwidth usage and you want to find out what is going on. The outcomes of the analysis with this tool can then be used to adjust your firewall or caching rules.

Installing

On my Ubuntu 15.04 I can install version 0.8.3 with:

sudo apt-get install goaccess

If you are on any Debian based Linux and you want to run the latest (0.9.2) version you can simply run:

wget http://tar.goaccess.io/goaccess-0.9.2.tar.gz
tar -xvzf goaccess-0.9.2.tar.gz
cd goaccess-0.9.2/
sudo apt-get install build-essential
sudo apt-get install libglib2.0-dev
sudo apt-get install libgeoip-dev
sudo apt-get install libncursesw5-dev
./configure --enable-geoip --enable-utf8
make
sudo make install

This will install the software in “/usr/local/bin/” and the manual in “/usr/local/man/”.

Running

Running the software is as easy as:

man goaccess
goaccess -f /var/log/apache2/access.log

The software will prompt you the log format. For me the “NCSA Combined Log Format” log format worked best. For nginx I just ran:

goaccess -f /var/log/nginx/access.log

It is really awesome, try it!

Share

Detecting torrent traffic on a Linux box

torrent_detection

At home I am sharing my Internet connection with several other family members. Sometimes my Internet is very slow with high latencies, causing my interactive SSH connections to stutter. The problem is always the same: somebody is downloading a torrent. And although I have no objection against torrent technology (it has many good applications), I hate it when I cannot work properly on my remote servers. So I decided to take action.

Wireshark and Tshark to the rescue

Wireshark has a command line version called “tshark”. It has a bittorent protocol analyzer and can be used to do Deep Packet Inspection (DPI). I decided to make a simple script that runs every 5 minutes and samples the network traffic for 10 seconds. After that it sends a report (top list, including packet count) of the local IP addresses that do the most torrent traffic (if there are any).  It can be ran using:

sudo tshark -a "duration:10" -Y bittorrent -f 'not port 80 and not port 22 and not port 443' | grep -o "192\.168\.1\.[0-9]\+" | sort | uniq -c | sort -rn | head | mail -E -s "LAN abusers" maurits@vdschee.nl

It is using postfix to send email via the gmail SMTP server (gmail account required). I am runnig the above in a cron job every 5 minutes. You may simply run this script on the gateway of your network. In case you can setup a port mirror on the switch of your up-link, then you can run this in promiscuous mode. Tshark will try to enable this mode by default, if it does not work, then check the FAQ here.

Blocking on detection

There are several ways to block the user that is abusing your network. I feel that temporary null routing the IP address is the simplest way. Additionally you may add an entry to your DHCP lease table to avoid that the user can simply request a new IP address. Filtering the good from the bad traffic is actually much more complicated. For one, because you need to find all the bad packets (as the software may try to avoid the block, switching protocols). If you really want to give it a try, you may look at netfilter string match. If you do, then make sure you enter good offsets and ranges to avoid negative performance impact on your network. Also I would not know where to get a maintained and complete set of protocol signatures.

torrent_utp_detection

Detecting uTP

If you are using the “Deluge” torrent client, you will be quickly detected by the above script. When you are using “Transmission” (another client) you may get away undetected. This is caused by the Micro Transport Protocol (aka “uTP”). This is a UDP based torrent protocol that cannot be recognized by Tshark yet. It is not very hard to actually make a custom rule that detects “uTP”. This is the custom filter:

sudo tshark -a "duration:10" -Y 'udp[8:5] == "\x64\x32\x3A\x69\x70" or bittorrent' -f 'not port 80 and not port 22 and not port 443' | grep -o "192\.168\.1\.[0-9]\+" | sort | uniq -c | sort -rn | head | mail -E -s "LAN abusers" maurits@vdschee.nl

The above command will detect also the “undetectable” uTP protocol. You may even extend the match a little as there are more fixed position bytes that can be matched.

Share

BSOD during boot after disabling RAID in BIOS

The motherboard of my Acer M3920 has an on-board RAID controller which is part of the Intel Rapid Storage Technology. I added a 6TB WD Green drive, which was very easy using the easy-swap bay that the Acer M3920 has (see picture below). Unfortunately the RAID BIOS did not recognize the drive I inserted into the machine. The normal BIOS did recognize it, but in the RAID BIOS it was reported as a 1.4TB disk. Also Windows 7 Disk Management did not “see” the drive.

acer_m3920

Normally I would have updated the BIOS (as this would have likely solved the problem), but I could not find a newer BIOS on the Acer support site. So I decided to turn the RAID support off in the BIOS (switch the SATA mode from “RAID” to “AHCI”), since I did not used the RAID capabilities. This caused Windows to give a BSOD during boot with message “STOP: 0x0000007B” (see picture below). This actually is a cryptic way of Windows telling you “INACCESSIBLE BOOT DEVICE”.

bsod_inaccesible_boot_device

To avoid this error, the instructions in Microsoft KB article 922976  are not sufficient. They are sufficient when changing from AHCI to RAID, but not when changing from RAID to AHCI. Use the “regedit.exe” utility to change the “Start” key to value “0” for both of the following entries in your registry:

  • HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\msahci (as mentioned in the KB)
  • HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\atapi (not mentioned in the KB)

If you have this BSOD, reboot and set the SATA mode (in the BIOS) back to RAID. Then reboot and change the above registry settings. Finally change the SATA mode to “AHCI” and boot without BSOD and full access to your 6TB drive.

Special thanks go to M

 

 

Share

Data recovery on a crashed RAID-5 array

In 2008 I set up a NAS with 4x750GB drives in a RAID-5 array. Back then such a setup was pretty awesome (and expensive). I used it to store my digital family pictures on. Such an awesome setup with RAID-5 and a cold spare (just in case) made me feel safe about my data. I felt so safe that I totally forgot to make sure all files were also backed up somewhere else. Then in 2012 disaster striked and disaster was looking like this:

raid_failure

The Promise Fasttrak TX4310 RAID (actually a fake-raid) controller is showing that the 2TB array is “offline”. It should have gone “critical” first when the first drive failed, but for some reason it didn’t (or it did and nobody noticed). Now two drives have failed and the array went “offline”. So, what to do next?

First rule of data recovery: don’t panic!
Second rule: do not write anything to the disks.

Repairing the server

After not panicking I decided to create a new array with a set of fresh drives and take my loss. I removed the controller and the four 750GB drives and connected two 3TB drives directly to the motherboard. The NAS was running Windows (don’t ask why) until then and at that moment I switched to Linux software raid (using mdadm). This was fairly easy to setup and it would even automatically email me in case of a failure. Great, but a little too late.

I stored the drives (together with the controller) in a box. I made sure I marked the drives with the corresponding port numbers before I disconnected them. Storing the drives allowed me to later decide whether or not the missing data was worth recovery. Last week (2015, three years after the crash) I decided I would actually give recovery a try.

Analyzing the problem(s)

I had never done any RAID data recovery before, so I was expecting a rough journey and it sure took a lot of time (2 weeks, every evening). First I connected all drives individually to the second SATA port of the motherboard of my Linux box. Then I read the SMART status of the drives. The SMART information will tell you whether or not a drive is healthy. I found that three of the four drives were actually healthy and one (drive 2) did not get recognized at all.

The drive was not even ticking, which most crashed drives do. The drive actually did not make sound at all. This led me to believe that the PCB might have gotten blown up. I checked, but unfortunately the crashed drive was not having the exact same part number as the spare drive. Otherwise I might have tried swapping the PCB from the spare to the broken drive.

The controller also did not recognize all the drives. The second one was “missing” (which was expected) and the fourth was reported as “free” (which is strange). After investigation I found out that this would most probably mean that the meta information on the position of the fourth drive in the array got corrupted. This meta information is stored in a so-called superblock and that is stored in the end of the drive.

Imaging the drives

Before I was going to do anything I decided I needed disk images. The good SMART reading made me optimistic about the chance of successfully imaging 3 out of the 4 drives. So I bought a big 6TB drive and started imaging the drives one-by-one. I do recommend that you do not simply use “dd”, but that you actually use the enhanced “ddrescue” tool. It allows for retrying and skipping bad blocks. Also, it stores progress in a log file, so that it can continuing when it was interrupted.

After I imaged all three working drives it turned out that one of the drives (the one that reported “free”) was having some bad blocks. Nevertheless “ddrescue” was able to make an image of the drive. It just took a little longer.

Superblock regeneration

I decided to look into the end of the drive using a “dd” with a “seek” parameter and piping that through “hd” (hex dump). I found that at the end of the images there was indeed a superblock. The one from disk 4 actually looked different than the ones from disk 1 and 3. Now all I had to do is recreate the superblock and write it to the disk. After that I expected the array to return to “critical” state (one drive missing). In this state the array should be readable.

After imaging I decided it was time to take some risk and break rule 2 (don’t write to the disk). I tried many things to recreate the superblock. I tried to use “ghex” and repair the superblock by hand (only 8 bytes were different between the superblocks of disk 1 and 3). I tried recreating the array WITHOUT INITIALIZATION, so that it would only write new superblocks. This also did not work, not using the Promise BIOS and also not using the WebPAM software from Windows. I guess this method did not get the array’s RAID parameters exactly right.

ReclaiMe to the rescue

Then I read a positive review on the web about some proprietary (but free) Windows RAID recovery software. In order to be able to run it I created a Windows 7 VM using KVM on my Linux box and attached the images to it using the SATA driver. Then I installed “ReclaiMe Free RAID Recovery” from www.freeraidrecovery.com and gave it a try. I was skeptical, but I should have not been. After some extensive searching on the disks the software found a RAID-5 array with a missing drive. That was music to my ears!

“ReclaiMe Free RAID Recovery” gave me the option to recreate the array to a new disk. I quickly created a sparse 2TB image on my 6TB drive and added it as another drive to the VM. Then it took the software 40 hours to recreate the array into this image. But after that, even without reboot, Windows identified the NTFS partition. I was able to access all my data again. I can not explain how happy and amazed I was. I powered of the VM, loop mounted the image on my Linux box using “kpartx” and was able to copy everything to the new Linux NAS server.

Success!!!

I recovered the picture below (and thousands of others).

twins_2000-12-28

This particular picture shows me (right) and my twin brother (left) behind my PC (web-cam shot from 28th of December 2000).

Disclaimer / Warning

I do NOT recommend you to data recovery without any experience. There is a fair chance that you make a mistake. If you accidentally write to the (original) disks you may lose the data forever, so be aware. That said, if you can actually copy the disks to images and/or new disks, then you have some freedom to experiment. If you are really lucky and the disks are not (severely) damaged then you may even be successful, just like I was.

Share

Limit concurrent PHP requests using Memcache

When you run a website you may want to use nginx reverse proxy to cache some of your static assets and also to limit the amount of connections per client IP to each of your applications. Some good modules for nginx are:

Many people are not running a webfarm, but they still want to protect themselves against scrapers and hackers that may slow the website (or even make it unavailable). The following script allows you to protect your PHP application from too many concurrent connections per IP address. You need to have Memcache installed and you need to be running a PHP web application that uses a front controller.

Installing Memcache for PHP

Run the following command to install Memcache for PHP on a Debian based Linux machine (e.g. Ubuntu):

sudo apt-get install php5-memcache memcached

This is easy. You can flush your Memcache data by running:

telnet 0 11211
flush_all

You may have to restart apache for the Memcache extension to become active.

sudo service apache2 restart

Modifying your front controller

It is as simple as opening up your “index.php” or “app.php” (Symfony) and then pasting in the following code in the top of the file:

<?php
function firewall($concurrency,$spinLock,$interval,$cachePrefix,$reverseProxy)
{
  $start = microtime(true);
  if ($reverseProxy && isset($_SERVER['HTTP_X_FORWARDED_FOR'])) {
    $ip = array_pop(explode(',',$_SERVER['HTTP_X_FORWARDED_FOR']));
  }
  else {
    $ip = $_SERVER['REMOTE_ADDR'];
  }
  $memcache=new Memcache();
  $memcache->connect('127.0.0.1', 11211);
  $key=$cachePrefix.'_'.$ip;
  $memcache->add($key,0,false,$interval);
  register_shutdown_function(function() use ($memcache,$key){ $memcache->decrement($key); });
  while ($memcache->increment($key)>$concurrency) {
    $memcache->decrement($key);
    if (!$spinLock || microtime(true)-$start>$interval) {
      http_response_code(429);
      die('429: Too Many Requests');
    }
    usleep($spinLock*1000000);
  }
}
firewall(10,0.15,300,'fw_concurrency_',false);

Add these lines if you want to test the script in stand-alone mode:

session_start();
session_write_close();
usleep(3000000);

With the default setting you can protect a small WordPress blog as it limits your visitors to do 10 concurrent(!) requests per IP address. Note that this is a lot more than 10 visitors per IP address. A normal visitor does not do concurrent requests to PHP as your browser tends to send only one request at a time. Even multiple users may not do concurrent requests (if you are lucky). In case concurrent requests do happen they will be delayed for “x” times 150 ms until the concurrency level (from that specific IP) is below 10. Other IP addresses are not affected/slowed down.

If you use a reverse proxy you can configure this (to get the correct IP address from the “X-Forwarded-For” header). Also if you set “$spinLock” to “false” then you will serve “429: Too Many Requests” if there are too many concurrent requests instead of stalling the connection.

This functionality is included as the “Firewall” feature of the new MindaPHP framework and also as the firewall functionality in the LeaseWeb Memcache Bundle for Symfony. Let me know what you think about it using the comments below.

Share