Text files decoding in text editors

Forum

Forum
General C++ Programming
Text files decoding in text editors

Text files decoding in text editors

May 11, 2012 at 9:44am

Hello guys,

I have a question about the way text editors read text files.

How does a text editor detect that a text file is binary, and not readable as a text file?

And how does it know that a text file could be written in unicode (2 byte coding) or any other coding?

Such files have normally no headers at all!! If a file is open with fstream, what the fstream object sees is a series of a bytes... how can we know if this file is readable as text?

May 11, 2012 at 10:32am

coder777 (8449)

how can we know if this file is readable as text?

There're several ways.

You can check with isprint() if the characters are ASCII. If it is not ASCCII then several encodings have certain characteristics that you can recognize.

unicode isn't necessary 2 bytes. Microsoft keeps naming it unicode but it is actually UCS-2:

http://en.wikipedia.org/wiki/Unicode

For instance for the majority of those letters (plain alphanum) there's a leading or trailing 0.

May 11, 2012 at 10:57am

TheDestroyer (441)

Thank you for your reply.

So this check has to be done for EVERY character? sounds pretty expensive!!!

May 11, 2012 at 11:30am

coder777 (8449)

But you need to read each and every char nonetheless. So why not doing the checking. But you can be satisfied with 1000 or so characters. It's only done once and reading from the hard drive is likely to be much slower

Last edited on May 11, 2012 at 11:31am

May 11, 2012 at 1:35pm

TheDestroyer (441)

OK. Thanks a lot for the info :)

Have a nice weekend!

Topic archived. No new replies allowed.

C++

Forum

Text files decoding in text editors