Buffered Nginx log reading using PHP and logrotate

The problem: We want to continuously read the Nginx access logs on our high-traffic web nodes for real-time analytic purposes.

Logrotate every 10 seconds

Not possible! The minimum that logrotate allows is hourly. Apart from that I do not think this is very practical. I would still prefer to keep the logfiles hourly, for easy disk management. I do not like an hour of delay on my statistics. I would prefer something like one minute until it reaches the MySQL database.

Using a named pipe with mkfifo

The idea is that you can make a pipe with “mkfifo” and configure Nginx to log to that file. Sounds good until you realize that the buffer size of the pipe is quite limited (a few kb) and it will just silently drop log lines on buffer overflow. Also Igor Sysoev advised that you should not do it and since he knows what he is talking about, I suggest you stay away from it. 🙂

Watch the log file with inotify

This idea is to install a watcher on the log file that triggers an event when the file is written to. This is a good idea, but the amount of events on a busy server may be just a little too high for this to be really efficient. We would rather not be notified every time the filesystem sees a change on the Nginx access log file. EDIT: This is actually not a bad idea as long as you enable buffered Nginx access logging, see next post.

Open the file and read until EOF every x seconds

Open the file and keep it open. Sleep for x seconds and read the file until the end. Send that data to your statistics cluster and return to the sleep. This works great until the file gets rotated. Once the file gets rotated, you first need to finish reading the old file until you start reading the new one. Fortunately, logrotate moves the file by default and a open moved file can still be read using the original file descriptor. When we find that that the file has been moved y seconds ago and not been written to for z seconds, then we can decide to close the rotated file and open the new current file. This way we can ensure that we do not lose any log lines. Certainly x, y, and z need some values found using trial and error. I think five, three, and one seconds should be a good starting point.

Configure logrotate to rotate hourly for 48 hours

This can be done using the following logrotate script:

/var/log/nginx/*.log {
        hourly
        missingok
        rotate 48
        compress
        delaycompress
        notifempty
        create 0640 www-data adm
        sharedscripts
        prerotate
                if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
                        run-parts /etc/logrotate.d/httpd-prerotate; \
                fi \
        endscript
        postrotate
                [ -s /run/nginx.pid ] && kill -USR1 `cat /run/nginx.pid`
        endscript
}

Make sure you also move the logrotate cron job from daily to hourly using:

sudo mv /etc/cron.daily/logrotate /etc/cron.hourly/logrotate

Read the log files

The following code is an example in PHP that will demonstrate how this log reader will work. Note that Python and Golang may be more suitable languages for implementing these kinds of long running programs. This script just prints the data it reads. Streaming the log lines to your big data analytics cluster is left as an exercise to the reader. 😉

<?php
$logfile = "/var/log/nginx/access.log";
$var_x = 5;
$var_y = 3;
$var_z = 1;
$f = fopen($logfile, "r");
fseek($f,0,SEEK_END);
while (true) {
  $data = stream_get_contents($f);
  echo $data; // or send the data somewhere
  clearstatcache();
  $s1 = fstat($f);
  $s2 = stat($logfile);
  $renamed  = $s1["dev"] != $s2["dev"] || $s1["ino"] != $s2["ino"];
  if ($renamed && time()-$s1["ctime"]>$var_y && time()-$s1["mtime"]>$var_z) {
    echo "renamed\n";
    while (!file_exists($logfile)) sleep(1);
    fclose($f);
    $f = fopen($logfile,"r");
  } else sleep($var_x);
}
fclose($f);

While running the script above you can make sure the access log is written to another terminal using:

siege -c 100 http://localhost/

Now you can force the rotation to be executed using:

sudo logrotate -f /etc/logrotate.d/nginx

And see how the script gracefully handles it.

Share

Daily MySQL off-site backups using logrotate and rsync

For daily MySQL off-site backups I was looking for a simple approach. Scott Miller’s minimal, but concise post on MySQL Backups With Logrotate  lead me to the unusual choice of logrotate to backup MySQL. What Scott was not doing, and I did need, was off-site backup. For this I had set up rsync access to the machine with public key authentication.

After setting up the daily rsync with some help from this post of Uwe Hermann I found out that the logrotate files were all transfered every time rsync ran, so not only the new ones. This turned out to be due to the default naming scheme of logrotate (appending sequential numbers). Fortunately I found a post by Eric Thern titled logrotate tweaks to enhance rsync backups that explains that one can use the “dateext” option to make the logrotate files “rsync friendly”.

The last thing I had to find out is when the logrotate task was fired to choose a good time for my cronjob that fired the rsync. I found a post on serverfault that showed me how to find out when the daily cron tasks were started. It turned out they were started at 6:25 AM on my Debian based machine. I scheduled the cronjob on the remote server to rsync at 9:00 AM to be sure the backup was finished.

Instructions

I will go through all the files I modified on the production server and the (off-site) backup server. On the production server I created the file “/etc/logrotate.d/mysql-bkup”:

/home/maurits/bak/project.sql.gz {
daily
dateext
rotate 8
nocompress
create
postrotate
mysqldump -uproject -pproject project > /home/maurits/bak/project.sql --single-transaction
gzip -9f /home/maurits/bak/project.sql
chown maurits:maurits /home/maurits/bak/project.sql.gz
chmod 640 /home/maurits/bak/project.sql.gz
endscript
}

As you can see the logrotate is run daily, with the dateext option for better rsync efficiency. It keeps 8 backups (7 days) and does not apply compression (we do that manually). The create option will make sure new files are created with the same permissions and owner as the previous ones. We finish with the “postrotate” script that does a mysqldump, gzip and then sets a proper owner and permissions. To test run the script you can execute:

sudo logrotate -f /etc/logrotate.d/mysql-bkup

After running it we can check the created files:

maurits@nuc:~$ ls bak -al
total 18192
drwxr-xr-x  2 maurits maurits    4096 Jun 22 15:17 .
drwxr-xr-x 10 maurits maurits    4096 Jun 22 14:54 ..
-rw-r-----  1 maurits maurits 9291115 Jun 22 15:17 project.sql.gz
-rw-r-----  1 maurits maurits 9291114 Jun 22 15:06 project.sql.gz-20130622

Now we login to the (off-site) backup server. Here we need to configure the cronjob to retrieve the logrotated MySQL backups using rsync. We add the cronjob by editing the crontab with:

maurits@nuc:~$ crontab -e

Then we add the rsync command to the crontab. Mine looks like this:

# m h  dom mon dow   command
0 9 * * * rsync -a --delete-during www.project.nl:bak ~

As you can see we configured that at minute 0 of hour 9 on every day of the month, of every month, on any day of the week we run the rsync command. This is all there is to it. If you have any questions, remarks or improvements, please use the comments section below.

Share