I have a question about the way text editors read text files.
How does a text editor detect that a text file is binary, and not readable as a text file?
And how does it know that a text file could be written in unicode (2 byte coding) or any other coding?
Such files have normally no headers at all!! If a file is open with fstream, what the fstream object sees is a series of a bytes... how can we know if this file is readable as text?
You can check with isprint() if the characters are ASCII. If it is not ASCCII then several encodings have certain characteristics that you can recognize.
unicode isn't necessary 2 bytes. Microsoft keeps naming it unicode but it is actually UCS-2:
But you need to read each and every char nonetheless. So why not doing the checking. But you can be satisfied with 1000 or so characters. It's only done once and reading from the hard drive is likely to be much slower