|
|
Subject:
Collecting Japanese data in PHP
Category: Computers > Internet Asked by: shane43-ga List Price: $20.00 |
Posted:
20 Sep 2005 22:10 PDT
Expires: 20 Oct 2005 22:10 PDT Question ID: 570419 |
I'm trying to set up a php form that will collect data from japanese users. More specifically, I want to target only users who speak Hiragana or Katakana, therefore I need to check all user input to see if it falls under either category. I have no experience in working with foreign characters, so I found a script at phpclasses.org (http://promoxy.mirrors.phpclasses.org/browse/package/1425) which has two useful methods for checking whether user data is japanese [isHiragana(), isKatakana()], but it is not giving results that match my test cases. I'm guessing that the script is saved with incorrect encoding, or our server doesn't support foreign characters, or our php build doesn't support it. What needs to be set up on the server in order to handle the japanese language? If this does not solve our problem, then perhaps I can pay more to have you diagnose my specific set of scripts. Thanks! |
|
There is no answer at this time. |
|
Subject:
Re: Collecting Japanese data in PHP
From: thinkcomp-ga on 21 Sep 2005 21:01 PDT |
On the PHP side, you should make sure that the mbstring extension is enabled. To do this, you have to compile PHP with the extension. If you're running Linux, in the "configure" command, make sure the --enable-mbstring flag is present, so that your command looks something like: ./configure --enable-mbstring You also can check whether or not it's currently enabled by using the phpinfo() function in any PHP script. You may also want to check that your character encoding is correct for your page. The three Japanese character encodings are EUC-JP, ISO-2022-JP, and Shift_JIS. You can find more information on character encodings at: http://lfw.org/text/jp-www.html If you are using a database, you should make sure that your table supports the character encoding you are using. MySQL, for example, uses Latin-1 by default. |
Subject:
Re: Collecting Japanese data in PHP
From: eirikr_utlendi-ga on 22 Sep 2005 10:32 PDT |
@shane43 -- I'm woefully ignorant of PHP, but let me clear up some things for you about the Japanese language. Hiragana and katakana are two of the four scripts commonly used in Japanese, so no one "speaks" either of these. :) The other two scripts are called kanji (lit, "chinese characters") and romaji (lit, "roman characters", i.e. the Latin alphabet). Have a look here (http://en.wikipedia.org/wiki/Japanese_writing_system) for a sample of what Japanese looks like. So most commonly written Japanese will include kanji and hiragana. Any foreign words or words with very complicated kanji will generally be rendered in katakana. *However*, so long as you have the proper encoding (SJIS, EUC-JP, ISO-2022-JP, UTF8, etc), kanji and both kana systems will all appear as double-byte characters. Consequently, there's no need to look specifically for either form of kana; just look for double-byteness. This might be why your test cases are failing -- almost no Japanese text is written entirely in either hiragana or katakana, unless it's a children's book for beginning readers, or somebody doing something strange for effect, a bit like how the poet e.e. cummings used only lower case (poet Miyazawa Kenji wrote a whole piece entirely in katakana, google for "ame ni mo makezu" if you're interested). There might also be some double-byte Latin characters, which might make things complicated for you. MS Word and OpenOffice.org, for instance, both include a built-in UI command to switch double-byte Latin to single-byte; PHP might have something similar. If you're looking through lots of mixed multilingual text to find just the Japanese, the hiragana character "no" (looks like a 6 turned clockwise by 90 degrees -- have a look here http://en.wikipedia.org/wiki/Hiragana for the character and its Unicode encoding) is probably the single most common character used in Japanese, as it marks the possessive, and it is also *only* used in Japanese, so you won't catch any other languages by mistake. HTH, Eirikr |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |