PHP: strtolower and UTF-8
16.03.2009 Tags: tutorial,PHP,Version française disponible ici! I have a small function that I use for formatting strings and characters, mainly for URL rewrinting. It allows me to convert capital letters, replace white spaces with '-', remove special characters etc...The 'capital letters' part is done by the simple, yet efficient strtolower():
<?php
function urlrewrite($string) {
$string = strtolower($string);
...
}
?>
Working with the utf-8 charset, I noticed that a string like '2ème TEST' will return a wonderful 2�me test... not really what I wanted.
Having a look at php.net, I found that strtolower() get the charset defined in the current locale. Meaning that working in utf-8 or not (working in utf-8 -> page characters, database connection and datas), strtolower() won't care and will grab the local charset. This will probably result in a conversion to ISO or even ASCII characters, leading to some special characters that won't be correctly displayed.
Well, and now? We have 2 main solutions.
The first would be to convert from utf-8 to the local charset, use strtolower(), then reconvert the result to utf-8 like this:
<?php
function strtolower_utf8($string) {
$result = utf8_decode($string);
$result = strtolower($result);
$result = utf8_encode($result);
return $result;
}
?>
It works, but it's not really... sexy.
The other solution: user the function mb_strtolower(). It also allows you to lower all capital letters, but with the possibility to define in which charset.
So with a lighter function, we can have the same result:
>?php
function strtolower_utf8($string) {
$string = mb_strtolower($string,'UTF-8');
}
?>
Add a comment
Comments
Panagiotis
29.04.2010, 01:31:42
Thanks man
lacus
12.11.2010, 07:48:08
Thanks for the solution!
I was in the stack with hungarian characters too, but this works fine.
I was in the stack with hungarian characters too, but this works fine.
flashfs
23.01.2012, 16:44:32
Thank you, that solved my problem. I was using mb_strtolower without specifying 'UTF-8'




