Google Answers Logo
View Question
 
Q: mysql/php/apache and japanese characters ( No Answer,   7 Comments )
Question  
Subject: mysql/php/apache and japanese characters
Category: Computers > Programming
Asked by: philcartmell-ga
List Price: $10.00
Posted: 27 Aug 2002 08:58 PDT
Expires: 26 Sep 2002 08:58 PDT
Question ID: 59070
Hi,
I am writing a multi-lingual website - i've got all aspects working
including regular single byte languages like english/french, but need
to also include the japanese language as well - i've looked around the
net for answers but im having problems finding the info needed.

I am using Mysql, PHP, and Apache - currently on a win2k dev
environment but will be deployed on a unix varient such as
redhat/freebsd.

I would like some code examples of how to insert and extract japanese
characters from the database - and how to display them on the page - I
can see from the php documentation that its got something to do with
the mb.* functions but i'm still confused.

I understand their 3 (i think) diff character sets used over their -
the most popular being JSIS? Which character set should I use for the
site.. i.e. what will give it maximum compatability?

Many Thanks
Phil Cartmell
Answer  
There is no answer at this time.

Comments  
Subject: Re: mysql/php/apache and japanese characters
From: fj-ga on 27 Aug 2002 11:18 PDT
 
have a look at http://web.shodouka.com/ might help?
Subject: Re: mysql/php/apache and japanese characters
From: cwrl-ga on 28 Aug 2002 04:26 PDT
 
The obvious advice would be to use UTF-8 or another Unicode
transformation format for all the pages. This way you wouldn't need to
have a different character set for different pages and could easily
mix content from different languages/code pages on each page. The two
obvious problems with this approach are (1) it increases the byte
length of any given string in Japanese over that in a `native'
character set; and (2) you may have some trouble with data submitted
by users.

The first problem is probably an acceptable tradeoff, and perhaps you
can make PHP compress data sent to clients using gzip -- all modern
web browsers support this through the Accept-Encoding mechanism. The
second problem is more difficult, but will actually apply in any case.

http://ppewww.ph.gla.ac.uk/~eflavell/charset/form-i18n.html has some
useful information on this topic. You may be OK if you use the
accept-charset attribute on all your forms.

http://www.newsisfree.com/sitenews/2002/02/14.html has some pointers
to information about PHP, MySQL and UTF-8. MySQL does not yet support
the UTF-8 character set in the database, but it may be that you can
get away without it. You can store UTF-8 byte strings as ISO-8859-1 in
the database; all lengths will be reported as bytes not characters,
and searching may not work quite right, but you will certainly be able
to store and retrieve the data OK.
Subject: Re: mysql/php/apache and japanese characters
From: philcartmell-ga on 28 Aug 2002 05:14 PDT
 
Hi,
The url you have posted doesnt work! (404) 

Their may not be any forms on the japanese version of the site - we
simply <grin> need to store japanese characters into the database and
print them to the browser using PHP - I am looking for hard examples
(code) to insert into the database and get into php - i understand
it's possible in theory but I want examples of code I can use - as per
my question above.
Cheers
Phil
Subject: Re: mysql/php/apache and japanese characters
From: cwrl-ga on 28 Aug 2002 07:03 PDT
 
You're right-- it doesn't work. It worked this morning. Google's cache
doesn't seem to have a copy, either.

Hmm. Try http://ppewww.ph.gla.ac.uk/%7eflavell/charset/form-i18n.html
which should be the same thing, but seems to work.


I'm no PHP expert, but it looks to me as if you should be able to just
insert UTF-8 strings into the database -- ignore for the moment the
MySQL character set issues -- then emit them to the client after
having passed them through htmlentities(). Note that you'll need to
pass the string "UTF-8" to that function to identify the character set
-- see the discussion at the bottom of
http://www.php.net/manual/en/function.htmlentities.php

Everything else should, I think, just be the same as in the code for
processing ISO-8859-1 strings. You can convert the existing data from
ISO-8859-1 to UTF-8 using PHP's utf_encode() function, described at
http://www.php.net/manual/en/function.utf8-encode.php
Subject: Re: mysql/php/apache and japanese characters
From: philcartmell-ga on 28 Aug 2002 08:01 PDT
 
What japanese character set should I use for the pages? which is the
most common one used in japan?
Subject: Re: mysql/php/apache and japanese characters
From: cwrl-ga on 28 Aug 2002 08:19 PDT
 
Can't answer that, but

  http://web.lfw.org/text/jp.html

  http://web.lfw.org/text/jp-www.html

  http://web.lfw.org/text/jp-disp.html

seem useful. Is UTF-8 not an option?
Subject: Re: mysql/php/apache and japanese characters
From: auroraeosrose-ga on 03 Dec 2002 08:16 PST
 
I just completed a project to allow 6 languages including chinese and
japanese, using both mysql and mssql as a database with php...in order
to keep myself sane I used the utf-8 approach.

Fist things first, you have to set up php to use character sets with
multiple bytes.  That means using the mb module.  On linux that means
compiling php --enable-mbstring.  If you're using the new 4.3 php
versions (rc1 or rc2)
here's a blip from the manual about it...

the option --enable-mbstring  will be enabled by default and replaced
with --with-mbstring[=LANG]  to support Chinese, Korean and Russian
language support. Japanese character encoding is supported by default.
If --with-mbstring=cn  is used, simplified chinese encoding will be
supported. If --with-mbstring=tw  is used, traditional chinese
encoding will be supported. If --with-mbstring=kr  is used, korean
encoding will be supported. If --with-mbstring=ru  is used, russian
encoding will be supported. If --with-mbstring=all  is added, all
supported character encoding in mbstring will be enabled, but the
binary size of PHP will be maximized because of huge Unicode character
maps. Note that Chinese, Korean and Russian encoding is experimentally
supported in PHP 4.3.0.

That might be greek to you, it just means multibyte functions will be
included automatically when you compile, you just tell it what
languages you want.  On windows it's even easier to get the multibyte
functions, just make sure the extension=mb_string.dll line is
uncommented.

now, you can read the php manual about the functions
http://us2.php.net/manual/en/ref.mbstring.php

or you can do it the easy way, use utf-8 and do this in the ini:
find output_buffering and make sure it's turned on or set to a value
(4096 is the default)  Just below that is a line called output
handler.  that needs to be changed to:

output_handler = mb_output_handler

all your multibyte stuff will be handled correctly and automagically
then.  You HAVE to use multibyte functions whether you use utf-8 or
another character set.  it's just easier not to have to change
encoding all the time.  multibyte functions are one of the newer
features of php...this means you'll probably need a fairly new version
- I'd recommend at least 4.2.3

remember to restart your webserver!

Now that you have the module you need and the ini set up...it's time
to create a page:

First, at the top of every page on the website you need this line
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
that simply tells the browser what character set you're using. If
you're not using utf-8 you'll have to change that for every language
so they get their characters right.  for your information, your
characters sets are kind of dictated to you by php - from the manual:

Character encodings work with PHP: 
ISO-8859-*, EUC-JP, UTF-8

Character encodings do NOT work with PHP:
JIS, SJIS

the euc-jp is the japanese one to use with php...

Then you need some method of getting information into the database in
the right encoding.  I wrote a quick administration area program to do
it...if you will have information entered by users you can use the
same method...the trick is sending the right encoding in the form you
use on the webpage.

<form name="translatetext" action="text.php" method="post"
accept-charset="utf-8">
<textarea rows="8" cols="32" name="text" ></textarea>
<input type="submit" name="changetext" class="submit"
value="Translate/Change Text" /> </form>

Notice the line on the top of the form - it will force all entered
information to be submitted in utf-8 so it doesn't matter what is put
in - from chinese to hindi, it will all be in utf-8 when you get it. 
if you're using multiple character sets, you're going to have to have
multiple forms, each with different character sets.

then you connect to your database and use a simple insert statement to
put it into the db

$link = mysql_connect("mysql_host", "mysql_user", "mysql_password")
        or die("Could not connect");
    mysql_select_db("my_database") or die("Could not select
database");

    $query = "Insert into my_table (mytext) values ('$_POST[text]')";
    $result = mysql_query($query) or die("Query failed");

I'd stick a auto_increment id column for each text value...makes
getting it out easier...to get it out, you do a select, and then a
mysql_fetch_row or mysql_fetch_assoc to get the information - a simple
php echo will display it on the page unless you're doing something a
bit more esoteric.

    $query = "select text from my_table where id=1";
    $result = mysql_query($query) or die("Query failed");
    $info = mysql_fetch_assoc();
    echo $info['text'];

You can sniff browsers to get a user's set language

if(!isset($_SESSION['lang']))
{
$lang = $_SERVER['HTTP_ACCEPT_LANGUAGE'];
$lang] = strtolower(substr($lang, 0, 2));
if($lang != 'en' and $lang != 'it' and $lang != 'es' and $lang != 'zh'
and $lang != 'ja' and $lang != 'fr' and $lang != 'de') //or any other
languages you're gonna do
{$lang = 'en';}
$_SESSION['lang'] = $lang;
}

notice I assigned it as a session variable, remember to start a
session at the top of the page, then you can get to the language from
anywhere...then add another colum to your text table called language

your query can change the where clause to "where id=1 and
lang='$_SESSION[lang]'"

anyway, I hope this helped...took me forever to figure out that you
had to have php set up to use multibyte stuff to get it to work right.

if you have problems with mysql not holding information in multiple
character sets, try upgrading to mysql 4.0. whatever they're at now or
using utf-8 instead.

Have fun!

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy