How to input Chinese character?

Forum

Forum
Beginners
How to input Chinese character?

How to input Chinese character?

Apr 14, 2010 at 1:08pm

I have a .txt file .
I want to input the chinese character in the file and then do some programming.
After performing the program, I want to output an output file which contains Chinese character.
But I don't know how to input and output Chinese character.
Please help me.

Apr 14, 2010 at 1:10pm

chrisname (7395)

You need to learn about unicode. Try this: http://evanjones.ca/unicode-in-c.html

Apr 14, 2010 at 1:20pm

wisu0420 (7)

Do I need to convert the BIG5 to Unicode before I get the characters form a file?
I tried to use Firefox to convert the file content to Unicode8/16/32, but it outputs some unknown words.

Last edited on Apr 14, 2010 at 1:23pm

Apr 14, 2010 at 1:27pm

helios (17607)

I tried to use Firefox to convert the file content to Unicode8/16/32, but it outputs some unknown words.

That's not how Firefox works. Firefox will read the file using whatever encoding it is in (or whatever encoding you told it it is in) and will convert it to an internal representation. You can't tell it to specifically convert to some encoding.
What you probably did is tell Firefox that the file was in BIG5.

Apr 14, 2010 at 1:30pm

wisu0420 (7)

Could you please tell me what can I do in order to input Chinese characters from a file to a string?
I really do not have any ideas how to do this.

Apr 14, 2010 at 2:01pm

helios (17607)

There's no simple answer to that question. It depends on what you're trying to input from, what the encoding is, and what type of string is it.
The simplest answer I can give you that remains generic enough to be practical is to write the character to a file encoded as UTF-8 (not done programmatically), then load this file and, depending on which type of string you're writing to, write it to the string as it is, or decode the UTF-8 to an array of wchar_t or whatever other type your prefer and then write it to the string.
The simplest possible answer is to hardcode the character. For example,
std::string utf8string="\xEF\xBB\xBF"; //Unicode character U+FEFF

Apr 14, 2010 at 2:14pm

wisu0420 (7)

Do you mean that I need to convert all the characters to UTF-8 first, and then write the Unicode to string?

However I have encountered another problem. That is how can I convert the BIG-5 characters to UTF-8?

Sorry for asking so many questions. I really appreciate your effort of answering my questions.

Apr 14, 2010 at 2:25pm

helios (17607)

I said that was the simplest practical answer. If you already have the characters in some encoding, it may, and I emphasize "may", be simpler to leave the encoding as it is and do the conversion from code. This depends on how Big5 works and on whether you have access to encoding conversion libraries, such as iconv or ICU. But the pipeline is always the same for any value of A:
input as bit stream in encoding A -> conversion routine -> internal structure in Unicode -> output

Last edited on Apr 14, 2010 at 2:25pm

Apr 14, 2010 at 2:46pm

wisu0420 (7)

I think some example can let me understand more.

for example:
I have a .txt file call: "Chinese.txt"
inside the file there are some Chinese characters: 我的名字是小明。

now I want to read the file and then output to another file

ifstream ifs(Chinese.txt);
	if( !ifs){
	fileName=NULL;
	cout<<"file(s) cannot be open\n";
	exit( -1 );
	}
ofstream ofs(output.txt);
	if (!ofs)
	{ 	cout<<"file(s) cannot be open\n";
	exit( -1 ); }

wchar_t mystring[100];
fgetws (mystring , 100 , ifs);
fwprintf(ofs, mystring);
fclose (in);
fclose (out);
return 0;

Can this perform reading the file and then saving it to another file?

I do not know what is the role of Unicode in reading the file and saving the file.

Apr 14, 2010 at 3:58pm

helios (17607)

If you just want to read the file and copy its contents, then neither the encoding nor the kind of data itself make any difference. You can just read it as a binary file and copy the contents to another binary file.
If you want to do just about anything more complex than that, such as displaying it on the screen, then you're probably going to need Unicode, and none of the standard C or C++ functions is going to do the conversion for you.

Last edited on Apr 14, 2010 at 3:59pm

Apr 14, 2010 at 4:33pm

wisu0420 (7)

In ASCII, some number represent a word/something.
Is Unicode just similar to ASCII that some number in Unicode represent a word/something?

Below is my understanding:(Please kindly correct me if I got something wrong.)
If I want to read the file contains Chinese characters, I need to first convert it to Unicode.
For example: 我的名字是小明。
I need to convert it into something like XXXXX YYYYY ZZZZZ(Unicode) in the file.
And then read the Unicode into string.
After doing some functions, I output the Unicode and save it into a output file.
Then use software to convert it back to Chinese Character.

Could you suggest me some software that can perform conversion between Unicode and Characters?

Thank you so much for your time.

Apr 14, 2010 at 4:38pm

helios (17607)

Google iconv or ICU. iconv is a bit easier to use.

Apr 14, 2010 at 4:40pm

wisu0420 (7)

Is my understanding correct?

Thanks again!

Topic archived. No new replies allowed.