UTF-8 in PHP and MySQL under Ubuntu 12.04

UTF-8 is the de facto standard character set for PHP websites and there are but a few reasons not to use UTF-8 (utf_general_ci) as the default MySQL database collation. However, anyone arguing that UTF-16 is a better standard would probably be right, but because UTF-8 is more popular, nobody cares. Unfortunately, the guys at Ubuntu (or upstream at Debian, PHP and MySQL) still have some strange defaults configured in their software, as follows:

  1. PHP connects explicitly to MySQL with an “Latin 1” character set unless you send the “set names utf8” query.
  2. Apache does not specify a character set by default (nor does PHP), letting the browser determine which character set is used.
  3. MySQL sets the “latin1” as default character set and “latin1_swedish_ci” as default collation (for string comparison).

This is a longstanding issue. The reason for these western/Swedish defaults is that MySQL AB has a Swedish origin. Now that MySQL is the world’s most popular web database, and has been bought by Oracle (based in California/US), it seems like a strange choice. These days you would expect the following defaults:

  1. PHP connects to the server and uses the character set of the server, unless specified.
  2. Apache should assume all text content to be UTF-8 encoded.
  3. MySQL should have UTF-8 as the default character set and “utf_general_ci” as the default collation.

It is easy to make Apache/MySQL/PHP (under Ubuntu 12.04) behave the way you like. First we add the character set to Apache:

sudo echo "AddDefaultCharset utf-8" >  /etc/apache2/conf.d/utf8.conf

Now for MySQL, we open “/etc/mysql/my.cnf” and under the “[mysqld]” section we add the following 3 lines:

[mysqld]
...
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'

For a default of UTF-8 in the MySQL command line client (optional) you must add the following line in the “/etc/mysql/my.cnf” file under the “[client]” section:

[client]
...
default-character-set=utf8

Now restart the Apache and MySQL servers with the following commands:

sudo service mysql restart
sudo service apache2 restart

This is really all you have to do on a default Ubuntu 12.04. To check whether or not everything works correctly put the following “utf8.php” file on your website:

<?php
mysql_connect('localhost', 'username', 'password');
mysql_select_db('database');
$re = mysql_query('SHOW VARIABLES LIKE "c%";')or die(mysql_error());
while ($r = mysql_fetch_assoc($re))
{ echo $r&#91;"Variable_name"&#93;.': '.$r&#91;"Value"&#93;; echo "<br />";
}

The output should be:

character_set_client: utf8
character_set_connection: utf8
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8
character_set_server: utf8
character_set_system: utf8
character_sets_dir: /usr/share/mysql/charsets/
collation_connection: utf8_general_ci
collation_database: utf8_general_ci
collation_server: utf8_general_ci
completion_type: NO_CHAIN
concurrent_insert: AUTO
connect_timeout: 10

Let me know if you still have any trouble making it work. Good luck!

Share

16 thoughts on “UTF-8 in PHP and MySQL under Ubuntu 12.04”

  1. Thanks for this. I’d found sites that commented on the Apache configuration or the MySQL configuration, but not both.

    My Xubuntu 12.04 LTS installation, had a file under /etc/apache2/conf.d called charset. It basically had some comments and one commented line: AddDefaultCharset utf-8. I uncommented that line, rather than creating a separate file called utf8.conf. Worked.

  2. @Richard: Thank you for the comment. Uncommenting is even easier, so that is a good alternative.

  3. @revanth: You are welcome. Thank you for letting us know it was useful.

  4. Hi

    Thanks for this great tutorial.
    I followed your directions and now my php forms work ok (i.e. I can save and retrieve accented data in the MySql database). However, accented characters in labels in the forms themselves, or in any html program, show off garbled. For example, número appears as n�mero. These same forms work ok in a Debian Squeeze server, but not in my current development box, running on Debian Wheezy, which I recently had to reinstall due to a disk failure.
    I have already tried many approaches, but no one seems to work.

    Any help would be very much appreciated.

  5. @Ildefonso: If you made sure you followed all the steps, then it can also be that the data you have in your MySQL table is not UTF-8. Can you reinsert the data?

  6. @Ildefonso: Try the data storage and retrieval with some Chinese character (like: 的) to be sure. And check your HTML, since it should not contain a character set meta tag. If it does it should look like this:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    You know you have to add this to the apache config (apache.conf) right?

    AddDefaultCharset utf-8

    In PHP you can do this on-the-fly by executing:

    header('Content-Type: text/html; charset=utf-8');

    Let me know whether it solves the problem.

  7. Hi Mauritz

    Last weekend was a very intense one, with me testing and retesting everything. I used a fresh Debian Wheezy install, in order to avoid messing with my current work boxes. I used your tutorial as a guide, and checked the results with your utf8.php file. Initially, I found a mix of latin, swedish and utf8 character sets and collations, like so:
    character_set_client: latin1
    character_set_connection: latin1
    character_set_database: latin1
    character_set_filesystem: binary
    character_set_results: latin1
    character_set_server: latin1
    character_set_system: utf8
    character_sets_dir: /usr/share/mysql/charsets/
    collation_connection: latin1_swedish_ci
    collation_database: latin1_swedish_ci
    collation_server: latin1_swedish_ci
    completion_type: NO_CHAIN
    concurrent_insert: AUTO
    connect_timeout: 10
    Now it looks exactly as yours. (In my work boxes the mix was worse, as they date from some seven years ago, when iso-8859-1 was more prominent, and I didn’t change to utf8, even after each distro upgrade).

    These are the changes I made to configuration files
    /etc/mysql/my.cnf (only additions are shown)
    ———————–
    [client]
    default-character-set=utf8
    [mysqld]
    init_connect=’SET collation_connection = utf8_unicode_ci; SET NAMES utf8;’
    character-set-server=utf8
    collation-server=utf8_general_ci #collation-server=utf8_unicode_ci
    skip-character-set-client-handshake

    includesdirectory/dbconnect.inc
    ——————————————-
    header(‘Content-Type: text/html; charset=utf-8’);
    $dbcon = mysql_connect(‘localhost’,’user’,’password’, FALSE);

    /etc/apache2/conf.d/charset
    ————————————-
    AddDefaultCharset UTF-8

    One important thing to consider, is that all my php programs where written on in non utf8 compliant boxes, so I had to convert them with iconv, like in the next example:
    iconv -f ISO_8859-1 -t UTF-8 encodetest.php -o encodetest.php
    An inconvenience is that creation date and time of converted files is set to date of conversion. Not good if date creation of files is important to keep.

    As for the databases, I exported them via
    mysqldump -uroot -p –databases dbname_1 dbname_2 … dbname_n > /tmp/all_databases.bak

    Then, I replaced all “latin1” with “utf8″ via sed:
    sed -i ‘s/latin1/utf8/’ /tmp/all_databases.bak

    And then I imported them into mysql:
    mysql -uroot -p select f_id,f_name,f_description from formdef2 order by f_id desc limit 10;
    +——+———-+————————————-+
    | f_id | f_name | f_description |
    +——+———-+————————————-+
    | 789 | fpd03 | Mod fecha dia |
    | 788 | fpd01 | Selección faltas por día 的 |

    One final thing to note, is that I had to remove next tag from my php file. Otherwise, it wont submit and fail with errors

    Tag to be omited in phpfile
    <meta http-equiv=”content-type” content=”text/html; charset=utf-8″>

    Thanks for all your great support.

  8. @Ildefonso: You are welcome and thank you for your comments. You did a thorough job documenting your moves! I hope it will help other readers that fight this problem. Kind regards, Maurits

  9. One last thought.
    I found recode more convenient than iconv, for changing files’ encoding, as recode does not alter the creation date of files. Usage is as simple as:
    recode ISO-8859-1..UTF-8 filename.php

    It works for most file types. And if you have lots of files, you may issue:
    for i in $(ls *.php); do recode ISO-8859-1..UTF-8 $i; done

    Just for the record, it can make conversions both ways like, i.e.
    recode UTF-8..ISO-8859-1 filename.php

    Regards.

  10. mysql_connect(‘username’, ‘password’, ‘localhost’);

    it is:
    mysql_connect(‘localhost’, ‘username’, ‘password’);

    default root non passwd:
    mysql_connect(‘localhost’, ‘root’, ”);

    Still problem? Try using 127.0.0.1 instead localhost.

Leave a Reply

Your email address will not be published. Required fields are marked *