UTF-8 is the de facto standard character set for PHP websites and there are but a few reasons not to use UTF-8 (utf_general_ci) as the default MySQL database collation. However, anyone arguing that UTF-16 is a better standard would probably be right, but because UTF-8 is more popular, nobody cares. Unfortunately, the guys at Ubuntu (or upstream at Debian, PHP and MySQL) still have some strange defaults configured in their software, as follows:
- PHP connects explicitly to MySQL with an “Latin 1” character set unless you send the “set names utf8” query.
- Apache does not specify a character set by default (nor does PHP), letting the browser determine which character set is used.
- MySQL sets the “latin1” as default character set and “latin1_swedish_ci” as default collation (for string comparison).
This is a longstanding issue. The reason for these western/Swedish defaults is that MySQL AB has a Swedish origin. Now that MySQL is the world’s most popular web database, and has been bought by Oracle (based in California/US), it seems like a strange choice. These days you would expect the following defaults:
- PHP connects to the server and uses the character set of the server, unless specified.
- Apache should assume all text content to be UTF-8 encoded.
- MySQL should have UTF-8 as the default character set and “utf_general_ci” as the default collation.
It is easy to make Apache/MySQL/PHP (under Ubuntu 12.04) behave the way you like. First we add the character set to Apache:
sudo echo "AddDefaultCharset utf-8" > /etc/apache2/conf.d/utf8.conf
Now for MySQL, we open “/etc/mysql/my.cnf” and under the “[mysqld]” section we add the following 3 lines:
[mysqld] ... character-set-server=utf8 collation-server=utf8_general_ci init-connect='SET NAMES utf8'
For a default of UTF-8 in the MySQL command line client (optional) you must add the following line in the “/etc/mysql/my.cnf” file under the “[client]” section:
[client] ... default-character-set=utf8
Now restart the Apache and MySQL servers with the following commands:
sudo service mysql restart sudo service apache2 restart
This is really all you have to do on a default Ubuntu 12.04. To check whether or not everything works correctly put the following “utf8.php” file on your website:
<?php mysql_connect('localhost', 'username', 'password'); mysql_select_db('database'); $re = mysql_query('SHOW VARIABLES LIKE "c%";')or die(mysql_error()); while ($r = mysql_fetch_assoc($re)) { echo $r["Variable_name"].': '.$r["Value"]; echo "<br />"; }
The output should be:
character_set_client: utf8 character_set_connection: utf8 character_set_database: utf8 character_set_filesystem: binary character_set_results: utf8 character_set_server: utf8 character_set_system: utf8 character_sets_dir: /usr/share/mysql/charsets/ collation_connection: utf8_general_ci collation_database: utf8_general_ci collation_server: utf8_general_ci completion_type: NO_CHAIN concurrent_insert: AUTO connect_timeout: 10
Let me know if you still have any trouble making it work. Good luck!