converting a japanese string to romaji

Oct 14, 2010 at 5:34pm
I'm trying to write a sync program for my mp3 player but some of my mp3 files's names are in japanese and I need to convert them to romaji( the sound in english ex: ka ta a na ru etc...)
The program that im writing should work like this.Open a file(unicode) read a line convert it and repeat until file end.
I want to use the wchar_t related functions but don't know how to use them :(.
Any help appreciated.

http://www.romaji.org/ online converter
http://en.wikipedia.org/wiki/Hiragana // japanese hiragana
http://en.wikipedia.org/wiki/Katakana // japanese katakana
both for reference
Oct 14, 2010 at 5:54pm
IIRC, wchar_t doesn't necessarily mean necessarily mean Unicode or anything.

Personally, I've used Kana and Kanji without any problems in normal strings, however I have my locale set to Japan so it works even in non-Unicode mode.
Oct 14, 2010 at 6:29pm
Open a file(unicode)


Unicode isn't a file format. It's a system of associating glyphs with unique identifiers, but it says nothing about how those identifiers are stored in a file.

Text in files are commonly stored in a format called "UTF-8", but this might not always be the case. Before you go any further, you should find out exactly what format you're dealing with in this file.


That said, the C++ standard libs are painfully lacking in their support for Unicode and converting between Unicode formats, but some popular libraries can handle it with ease. Are you using any libraries, or are you sticking to just the standard lib?
Oct 14, 2010 at 6:31pm
Hohoho. Boy, you're in for some fun.

My suggestion is that you use a language better suited for this, like Python, which is great for scripting and has native Unicode support. If you try to do it in C++, you'll just spend an inordinate amount of time dealing with encoding conversion.
Also, I hope for your sake that the names don't contain kanji or that you know Japanese, or you simply won't finish this.
Oct 14, 2010 at 6:55pm
good thing is I have a friend that knows japanese(writing for now)
the other thing i tried is using the online converter with http://www.romaji.org/index.php?text = "Japanese String" but it gets all shit because of the encoding and the site receives random symbols and can't convert :(

Also I have found my self an example but can't figure out how it works
http://www.codeproject.com/KB/recipes/JapaneesTORomajiConverter.aspx
Last edited on Oct 14, 2010 at 7:01pm
Oct 16, 2010 at 11:46pm
the other thing i tried is using the online converter with http://www.romaji.org/index.php?text = "Japanese String" but it gets all shit because of the encoding and the site receives random symbols and can't convert


Can you post examples of where this converter fails?
Using a web-convertor would require the least amount of work.

However, if you want to code it yourself, the open-source standard for J-E dictionaries is EDict:

http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?

That combined with a convertor like this in Ruby and a decent parser:

http://crunchytoast.com/2010/04/01/improved-romaji-convertor-in-ruby/

and you should be all set; ie convert everything into hiragana first using EDict, and then convert the result into romaji.

Of course, realize that for Kanji, there can be multiple readings to the extreme. Easy Kanji like 上 show up with four readings, うえ うわ かみ じょう, one of which, うわ, I have never heard of before.
Topic archived. No new replies allowed.