UTF-8 in PHP and MySQL under Ubuntu 12.04

UTF-8 is the de facto standard character set for PHP websites and there are but a few reasons not to use UTF-8 (utf_general_ci) as the default MySQL database collation. However, anyone arguing that UTF-16 is a better standard would probably be right, but because UTF-8 is more popular, nobody cares. Unfortunately, the guys at Ubuntu (or upstream at Debian, PHP and MySQL) still have some strange defaults configured in their software, as follows:

  1. PHP connects explicitly to MySQL with an “Latin 1” character set unless you send the “set names utf8” query.
  2. Apache does not specify a character set by default (nor does PHP), letting the browser determine which character set is used.
  3. MySQL sets the “latin1” as default character set and “latin1_swedish_ci” as default collation (for string comparison).

This is a longstanding issue. The reason for these western/Swedish defaults is that MySQL AB has a Swedish origin. Now that MySQL is the world’s most popular web database, and has been bought by Oracle (based in California/US), it seems like a strange choice. These days you would expect the following defaults:

  1. PHP connects to the server and uses the character set of the server, unless specified.
  2. Apache should assume all text content to be UTF-8 encoded.
  3. MySQL should have UTF-8 as the default character set and “utf_general_ci” as the default collation.

It is easy to make Apache/MySQL/PHP (under Ubuntu 12.04) behave the way you like. First we add the character set to Apache:

sudo echo "AddDefaultCharset utf-8" >  /etc/apache2/conf.d/utf8.conf

Now for MySQL, we open “/etc/mysql/my.cnf” and under the “[mysqld]” section we add the following 3 lines:

[mysqld]
...
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'

For a default of UTF-8 in the MySQL command line client (optional) you must add the following line in the “/etc/mysql/my.cnf” file under the “[client]” section:

[client]
...
default-character-set=utf8

Now restart the Apache and MySQL servers with the following commands:

sudo service mysql restart
sudo service apache2 restart

This is really all you have to do on a default Ubuntu 12.04. To check whether or not everything works correctly put the following “utf8.php” file on your website:

<?php
mysql_connect('localhost', 'username', 'password');
mysql_select_db('database');
$re = mysql_query('SHOW VARIABLES LIKE "c%";')or die(mysql_error());
while ($r = mysql_fetch_assoc($re))
{ echo $r&#91;"Variable_name"&#93;.': '.$r&#91;"Value"&#93;; echo "<br />";
}

The output should be:

character_set_client: utf8
character_set_connection: utf8
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8
character_set_server: utf8
character_set_system: utf8
character_sets_dir: /usr/share/mysql/charsets/
collation_connection: utf8_general_ci
collation_database: utf8_general_ci
collation_server: utf8_general_ci
completion_type: NO_CHAIN
concurrent_insert: AUTO
connect_timeout: 10

Let me know if you still have any trouble making it work. Good luck!

Share