MySQL inserts duplicate records despite unique index

Okay, I thought I knew quite a bit about databases, so I did not expect any surprises when I added a unique index to my MySQL table. For those who do not know what a unique index is:

A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. – MySQL manual

After adding the index the table looked like this:

CREATE TABLE `table` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  UNIQUE KEY `a_b_c` (`a`,`b`,`c`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

And I was trying to insert or increment values in it like this:

INSERT INTO `table` (a,b,c) VALUES (1,NULL,3),(1,NULL,3) ON DUPLICATE KEY UPDATE c=c+VALUES(c);

I expected to get the following result:

mysql> select * from `table`;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 | NULL |    6 |
+------+------+------+
1 row in set (0.00 sec)

But it did not do that! I got this result instead:

mysql> select * from `table`;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 | NULL |    3 |
|    1 | NULL |    3 |
+------+------+------+
2 rows in set (0.00 sec)

I was clueless until I read the documentation of MySQL a little better:

This constraint does not apply to NULL values except for the BDB storage engine. For other engines, a UNIQUE index permits multiple NULL values for columns that can contain NULL. – MySQL manual

Especially the part where it says: “does not apply to NULL values” makes things much, much clearer. 🙂 Note that Microsoft SQL Server behaves different (they way I expected). So keep this is mind when using a unique index in MySQL, because I certainly did not expect this behavior!

Share

Making a histogram with Flot in PHP

histogram

When you are creating an administrative application in Symfony, you might end up collecting some simple statistics. When doing data analysis on them you might want to create a histogram and plot it (like in the picture above). When I was looking for the definition of a histogram, I found the following:

Histograms provide a visual display of quantitative data by the use of vertical bars. The height of a bar indicates the number of data points that lie within a particular range of values. These ranges are called classes or bins. — About.com on statistics

We do software development in PHP (Symfony) and we broke down the challenge of creating a histogram in two parts:

  1. Aggregate the quantitative data in classes/bins
  2. Drawing the actual graph onto the screen

We will solve them one at a time.

Aggregate the quantitative data

When you get an array with data from your Doctrine containing either values, associative arrays or objects you will have to reorganize the data. If your data is a simple array, you will have to iterate the data and determine in which class (or bin) the value falls (based on the width of the class) and increment the counter for the corresponding class. We created a class that allows you to do so, either when you have an array of values, associative arrays or objects.

                class Histogram
                {
                    protected $values;
                    protected $width;

                    public function __construct($width = 1)
                    {
                        $this->values = array();
                        $this->width = $width;
                    }

                    public function add($value)
                    {
                        $key = round($value/$this->width)*$this->width;
                        if (!isset($this->values[$key])) $this->values[$key] = 0;
                        $this->values[$key]++;
                    }

                    public function addArray(array $array, $key = false)
                    {
                        if ($key) $array = array_filter(array_map(function($i)use($key){return isset($i[$key])?$i[$key]:false;},$array));
                        array_walk($array,array($this,'add'));
                    }

                    public function addObjects(array $array, $prop)
                    {
                        $array = array_filter(array_map(function($i)use($prop){return isset($i->$prop)?$i->$prop:false;},$array));
                        $this->addArray($array);
                    }

                    public function getValues()
                    {
                        return $this->values;
                    }

                    public function sort($inverse = false)
                    {
                        if ($inverse) arsort($this->values);
                        else asort($this->values);
                    }
                }

You either create the object (with the proper class width) and add the values one-by-one using “add” or you add them all-at-one using “addArray” or “addObjects”.

Drawing the actual graph onto the screen

For this part we use the Flot library that allows us to draw nice graphs using canvas and jQuery. It allows us to plot the data using the “bars” visualization and specifying the “barwidth”, which should be set to the histograms class width. The result is the graph above that shows response times of an intensive database driven web application.

Share

Phpredis as Redis client in Ubuntu 12.04 LTS

Redis can be run with either Predis or phpredis. This post will explain how you can install the phpredis client on Ubuntu 12.04 LTS. This post assumes you run the standard LAMP stack. We have talked about running Redis in Symfony2 before, but then we assumed you would be running Predis.

Redis is an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets. — source: redis.io

The main advantage of running phpredis over running Predis is that it is faster, because phpredis is written in C, while Predis is written in PHP. So, if you want to run the fastest client for your server than follow these simple instructions:

Instructions for installing phpredis (bleeding edge)

1) Make sure you have the dependencies installed:

sudo apt-get install redis-server php5-dev build-essential xsltproc

2) Now clone the phpredis repo, configure and make:

git clone git@github.com:nicolasff/phpredis.git
cd phpredis/
phpize
./configure
make

3a) Install the compiled files directly:

sudo cp modules/redis.so `php-config --extension-dir`
sudo cp rpm/redis.ini /etc/php5/conf.d/
sudo service apache2 restart

3b) or create, view and install the Debian package:

./mkdeb-apache2.sh
dpkg -c phpredis-x86_64.deb
sudo dpkg -i phpredis-x86_64.deb

Instructions for installing phpredis (stable)

1) Make sure you have the dependencies installed:

sudo apt-get install redis-server dh-make-php php5-dev build-essential

2) Now get Redis from PECL and make a package:

mkdir redis
cd redis
export DEBFULLNAME="Maurits van der Schee"
export DEBEMAIL="m.vanderschee@leaseweb.com"
dh-make-pecl redis

3) or create, view and install the Debian package:

cd php-redis-2.2.3/
debuild
dpkg -c ../php5-redis_2.2.3-1_amd64.deb
sudo dpkg -i ../php5-redis_2.2.3-1_amd64.deb

Test to see whether it works or not

To test the loaded module create a PHP file that you run from the command-line or via Apache2:

        <?php
        $redis = new Redis() or die("Cannot load Redis module.");
        $redis->connect('localhost');
        $redis->set('random', rand(5000,6000));
        echo $redis->get('random');

That’s it! Easy. And it performs way better than Predis.

Reliable queues in Redis

You can run Redis, for instance, if you need a fast and scalable queuing system in a distributed platform. You can implement such a queue using the LPUSH and BRPOP commands. With LPUSH you add elements to the start of the list, while the BRPOP removes elements from the list, blocking when no elements are available. But if you want that queue to be reliable, you can consider using the BRPOPLPUSH command, that can add the returned element to an ‘in progress’ queue before returning it to the client. This way you can remove the element from the ‘in progress’ queue when you successfully handled them, or put them back on the original queue if you failed.

Share

Daily MySQL off-site backups using logrotate and rsync

For daily MySQL off-site backups I was looking for a simple approach. Scott Miller’s minimal, but concise post on MySQL Backups With Logrotate  lead me to the unusual choice of logrotate to backup MySQL. What Scott was not doing, and I did need, was off-site backup. For this I had set up rsync access to the machine with public key authentication.

After setting up the daily rsync with some help from this post of Uwe Hermann I found out that the logrotate files were all transfered every time rsync ran, so not only the new ones. This turned out to be due to the default naming scheme of logrotate (appending sequential numbers). Fortunately I found a post by Eric Thern titled logrotate tweaks to enhance rsync backups that explains that one can use the “dateext” option to make the logrotate files “rsync friendly”.

The last thing I had to find out is when the logrotate task was fired to choose a good time for my cronjob that fired the rsync. I found a post on serverfault that showed me how to find out when the daily cron tasks were started. It turned out they were started at 6:25 AM on my Debian based machine. I scheduled the cronjob on the remote server to rsync at 9:00 AM to be sure the backup was finished.

Instructions

I will go through all the files I modified on the production server and the (off-site) backup server. On the production server I created the file “/etc/logrotate.d/mysql-bkup”:

/home/maurits/bak/project.sql.gz {
daily
dateext
rotate 8
nocompress
create
postrotate
mysqldump -uproject -pproject project > /home/maurits/bak/project.sql --single-transaction
gzip -9f /home/maurits/bak/project.sql
chown maurits:maurits /home/maurits/bak/project.sql.gz
chmod 640 /home/maurits/bak/project.sql.gz
endscript
}

As you can see the logrotate is run daily, with the dateext option for better rsync efficiency. It keeps 8 backups (7 days) and does not apply compression (we do that manually). The create option will make sure new files are created with the same permissions and owner as the previous ones. We finish with the “postrotate” script that does a mysqldump, gzip and then sets a proper owner and permissions. To test run the script you can execute:

sudo logrotate -f /etc/logrotate.d/mysql-bkup

After running it we can check the created files:

maurits@nuc:~$ ls bak -al
total 18192
drwxr-xr-x  2 maurits maurits    4096 Jun 22 15:17 .
drwxr-xr-x 10 maurits maurits    4096 Jun 22 14:54 ..
-rw-r-----  1 maurits maurits 9291115 Jun 22 15:17 project.sql.gz
-rw-r-----  1 maurits maurits 9291114 Jun 22 15:06 project.sql.gz-20130622

Now we login to the (off-site) backup server. Here we need to configure the cronjob to retrieve the logrotated MySQL backups using rsync. We add the cronjob by editing the crontab with:

maurits@nuc:~$ crontab -e

Then we add the rsync command to the crontab. Mine looks like this:

# m h  dom mon dow   command
0 9 * * * rsync -a --delete-during www.project.nl:bak ~

As you can see we configured that at minute 0 of hour 9 on every day of the month, of every month, on any day of the week we run the rsync command. This is all there is to it. If you have any questions, remarks or improvements, please use the comments section below.

Share

UTF-8 in PHP and MySQL under Ubuntu 12.04

UTF-8 is the de facto standard character set for PHP websites and there are but a few reasons not to use UTF-8 (utf_general_ci) as the default MySQL database collation. However, anyone arguing that UTF-16 is a better standard would probably be right, but because UTF-8 is more popular, nobody cares. Unfortunately, the guys at Ubuntu (or upstream at Debian, PHP and MySQL) still have some strange defaults configured in their software, as follows:

  1. PHP connects explicitly to MySQL with an “Latin 1” character set unless you send the “set names utf8” query.
  2. Apache does not specify a character set by default (nor does PHP), letting the browser determine which character set is used.
  3. MySQL sets the “latin1” as default character set and “latin1_swedish_ci” as default collation (for string comparison).

This is a longstanding issue. The reason for these western/Swedish defaults is that MySQL AB has a Swedish origin. Now that MySQL is the world’s most popular web database, and has been bought by Oracle (based in California/US), it seems like a strange choice. These days you would expect the following defaults:

  1. PHP connects to the server and uses the character set of the server, unless specified.
  2. Apache should assume all text content to be UTF-8 encoded.
  3. MySQL should have UTF-8 as the default character set and “utf_general_ci” as the default collation.

It is easy to make Apache/MySQL/PHP (under Ubuntu 12.04) behave the way you like. First we add the character set to Apache:

sudo echo "AddDefaultCharset utf-8" >  /etc/apache2/conf.d/utf8.conf

Now for MySQL, we open “/etc/mysql/my.cnf” and under the “[mysqld]” section we add the following 3 lines:

[mysqld]
...
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'

For a default of UTF-8 in the MySQL command line client (optional) you must add the following line in the “/etc/mysql/my.cnf” file under the “[client]” section:

[client]
...
default-character-set=utf8

Now restart the Apache and MySQL servers with the following commands:

sudo service mysql restart
sudo service apache2 restart

This is really all you have to do on a default Ubuntu 12.04. To check whether or not everything works correctly put the following “utf8.php” file on your website:

<?php
mysql_connect('localhost', 'username', 'password');
mysql_select_db('database');
$re = mysql_query('SHOW VARIABLES LIKE "c%";')or die(mysql_error());
while ($r = mysql_fetch_assoc($re))
{ echo $r&#91;"Variable_name"&#93;.': '.$r&#91;"Value"&#93;; echo "<br />";
}

The output should be:

character_set_client: utf8
character_set_connection: utf8
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8
character_set_server: utf8
character_set_system: utf8
character_sets_dir: /usr/share/mysql/charsets/
collation_connection: utf8_general_ci
collation_database: utf8_general_ci
collation_server: utf8_general_ci
completion_type: NO_CHAIN
concurrent_insert: AUTO
connect_timeout: 10

Let me know if you still have any trouble making it work. Good luck!

Share