IT Community - Software Programming, Web Development and Technical Support

Encoding String

This is a discussion on Encoding String within the PHP Programming forums, part of the Web Development category; PHP's Problem with Character Encoding The basic problem PHP has with character encoding is it has a very simple ...


Go Back   IT Community - Software Programming, Web Development and Technical Support > Web Development > PHP Programming

Register FAQ Members List Calendar Mark Forums Read
  #1  
Old 03-28-2007, 11:49 PM
Anandavinayagam Anandavinayagam is offline
D-Web Sr.Programmer
 
Join Date: Mar 2007
Posts: 135
Anandavinayagam is on a distinguished road
Default Encoding String

PHP's Problem with Character Encoding

The basic problem PHP has with character encoding is it has a very simple idea of what the notion of a character is: that one character equals one byte. Being more precise, the problem is most of PHP‘s string related functionality (see common_problem_areas_with_utf-8 for further details) make this assumption but to be able to support a wide range of characters (or all characters, ever, as Unicode does), you need more than one byte to represent a character.

An example in code. From Sam Ruby’s i18n Survival Guide, he recommends using the string Iñtërnâtiônàlizætiøn for testing. Counted with your eye, you can see it contains 20 characters;

Iñtërnâtiônàlizætiøn
12345678901234567890

But counted with PHP‘s strlen function...

<?php
echo strlen('Iñtërnâtiônàlizætiøn');
?>

PHP will report 27 characters. That’s because the string, encoded as UTF-8, contains multi-byte characters which PHP‘s strlen function will count as being multiple characters.

Life gets even more interesting if you run the following2);

<?php
header('Content-Type: text/plain; charset=ISO-8859-1');

$str = 'Iñtërnâtiônàlizætiøn';

$out = '';
$pos = '';
for($i = 0, $j = 1; $i < strlen($str); $i++, $j++) {
$out .= $str[$i];
if ( $j == 10 ) $j = 0;
$pos .= $j;
}

echo $out."\n".$pos;
?>

You should see something like;

Iñtërnâtiônà lizætiøn
123456789012345678901234567
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
how to check which text encoding is used using php ptrckgorman PHP Programming 0 12-27-2008 06:53 AM
Encoding in PHP files jamilvoss PHP Programming 0 12-26-2008 11:35 AM
encoding and serialization smithcarvo ASP and ASP.NET Programming 1 10-11-2008 06:36 PM
Serialization and Encoding vigneshgets C# Programming 1 08-01-2007 10:37 PM
Encoding WMV file in C# .Net oxygen C# Programming 1 07-20-2007 07:16 AM


All times are GMT -7. The time now is 03:55 PM.


Copyright ©2004 - 2007, DiscussWeb. All Rights Reserved.
Our Partners
One Way Moving Companies | Stamford Dentist | Euro Millions Lottery | Home Loans| Furniture

SEO by vBSEO 3.0.0