Degree symbol looks fine as console input but writes to file incorrectly


I need to enter the degree symbol as console input, like

55 °F

This looks fine at the console. It's stored in a std::string and later written to a text file. The problem is, in the text file, the circle has a slash through it.

Here's a minimal reproducible example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
#include <string>
#include <fstream>

int main()
{
    std::string degree_sign;
    std::cout << "Enter degree sign: ";
    std::getline(std::cin, degree_sign);
    std::ofstream degree_out;
    degree_out.open("degree_symbol.txt", std::ios::out);
    degree_out << degree_sign << '\n';
    return 0;
}


Anybody know how I can fix this? I've tried entering it using both ALT+248 and ALT+0176. The result is the same. Unicode is not enabled on my system. I'm on Windows 10.

Thanks in advance for any help or suggestions. A lookalike would be fine. I haven't been able to find one.
Different character encodings.

After all, what you write to the terminal or to the file is just a sequence of bytes. How those bytes will appear as text in the terminal totally depends on what character encoding (Latin-1, UTF-8, etc. pp.) the terminal assumes! Usually this can be configured somewhere in the settings of the terminal emulator you are using. On Windows, there also is the SetConsoleOutputCP() function. Finally, when you open a file in a text editor, how the bytes in that file will appear as text again totally depends which character encoding your specific text editor assumes! A proper text editor will allow you to choose the desired encoding...

https://i.imgur.com/IjRCgxO.png
Last edited on
Open a command window and move to the relevant folder. If you enter
type degree_symbol.txt
then you will get your degree symbol.

On the other hand if you enter
notepad degree_symbol.txt
then you won't.
Last edited on
Ok thanks, I think I understand. So where does the mismatch happen?

My keyboard has a certain encoding, and the Windows console has a different encoding?

Or are the keyboard and the console the same encoding and C++ is translating the byte differently? ...
Essentially, your program writes a sequence of bytes to the terminal, or into the file.

The terminal – or the text-editor that you are using to view the file – then interprets those bytes as text. But, the outcome totally depends on what character encoding (Latin-1, UTF-8, etc. pp.) the terminal – or the text-editor that you are using to view the file – assumes when it interprets the given bytes as text characters!

For example: In "Latin-1" encoding (Windows-1252), the character ° is encoded as a single byte with the value 0xB0. But, in Code Page 437 (OEM), that very same characters is encoded as the byte value 0xF8. And, in UTF-8 encoding, that very same character even is encoded as the byte sequence 0xC2 + 0xB0 😏

https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout
https://en.wikipedia.org/wiki/Code_page_437#Character_set
https://en.wikipedia.org/wiki/UTF-8#Codepage_layout

_________

To make a long story short, be sure you write the text into the file in the desired character encoding. Then, when you view the file in a text-editor, make sure the matching character encoding is configured there.

https://i.imgur.com/IjRCgxO.png

Be aware, without additional meta-data, it is not possible to know character encoding of a "plaint" text file.

std::string is kind of "agnostic" to character encoding; it just stores char's (bytes).

Note: It can help to look at the "raw" bytes in your file with a hex viewer in order to avoid confusion ;-)
https://mh-nexus.de/de/hxd/
Last edited on
@lastchance thanks, I did that, and I see how the terminal and notepad are interpreting the data differently.
@kigar64551 thanks. After reading your reply, I think I have to take a different approach. I think for my program all input needs to come from a .txt file, not real time input from the keyboard. That way there's less of a chance of an encoding mismatch.

Thanks for taking the time to explain this. Very helpful.
The following is Windows-specific, but on Windows you could use GetConsoleCP() to determine the actual console input code-page (character encoding) – which could be different on each machine.

Then you could use MultiByteToWideChar() to convert the char* string, which you have read from the terminal, from the console input code-page (as determined previously) to a wchar_t* (UTF-16) string.

Finally, the wchar_t* (UTF-16) string could again be converted to a char* string, in whichever code-page that you like (probably UTF-8), by using WideCharToMultiByte(), and then be written to the file.
Last edited on
Topic archived. No new replies allowed.