char8_t

Pages: 12
I am currently using the latest version of code blocks 20:03
But I seem not be able to compile this code:

 
  char8_t ch;


After couple searches I find out that the code::blocks is not a compiler, rather it is just a fancy IDE(which I prefer btw), and It seems the compiler is something call gcc.

I went to code::blocks directory, and I found that gcc(1.86 MB) and find out g++(also 1.86 MB)

My first question is gcc and g++ are the same?... You know like cpp is the same as c++.

My second question: is anyone finding issues with char8_t too?
Last edited on
gcc is the C compiler, g++ is the C++ compiler.

char8_t is a C++20 type, you have to compile using -std:c++20.

The default installed MinGW/GCC compiler used by C:B 20.03 isn't fully C++20 compliant, it is 8.1.0. You'd have to update the compiler C:B uses to at least v9 to use char8_t.
I believe char 8 is c++ 20, so you need a *very* up to date compiler to use it.
Just say int8_t instead, both are just 8 bit integers (signed) if you can't get support for it yet.

consider:
https://www.geeksforgeeks.org/difference-between-gcc-and-g/

Last edited on
@jonnin
I believe char 8 is c++ 20
, char8_t is actually c++20. and thanks for the link)

@George P, "The default installed MinGW/GCC compiler used by C:B 20.03 isn't fully C++20 compliant, it is 8.1.0. You'd have to update the compiler C:B uses to at least v9 to use char8_t"
I was preparing for this, but you beated(I do not know how to write this one) me to it))) amazing.
I find out its version by using the command prompt
My first question to you @George how did you know the version of that compiler?

About my second question: Based on your answers, I think you do not encounter any problem using char8_t as undeclared, am I right?
To use int8_t the <cstdint> header should be included, available since C++11.

https://en.cppreference.com/w/cpp/types/integer

The fixed-width types are derived from C, that's why a C header is used, char8_t is a fundamental C++20 type.
Thanks @George P, but I not asking for the int8_t it brought by Mr @jonin.

My question to you was: How did you know the version of that compiler, exactly 8.1.0?
How did I know what the default version of the MinGW/GCC compiler used by C:B? The binaries download page:

http://www.codeblocks.org/downloads/binaries/

C:B is easy to change/modify/update the compiler used. https://stackoverflow.com/questions/62011663/how-to-change-compiler-in-c-ide-codeblocks-or-dev-c

I created a new compiler type with a newer MinGW version and use that when I use C:B.

Even if you get a current MinGW/GCC version it won't be 100% C++20 compliant. The only compiler/IDE available at this time that is 100% compliant is MS Visual Studio (NOT Visual Studio Code!)

The thing with char8_t (oops!) int8_t availability is it is an optional feature. An implementation is not required to make it available.

I have never gotten the optional C++20 fixed width types to work with std::cout, no matter what compiler I used. printf is what I had to use.
1
2
3
4
5
6
7
8
9
10
#include <iostream>
#include <cstdint>
#include <cstdio>

int main()
{
   char8_t c8t = 15;

   printf("%i\n", c8t);
}
Last edited on
@George P
C:B is easy to change/modify/update the compiler used.
. Man you did it again. It is like you read my mind.

All the topic was a preparation to ask how can I change that gcc so that It can declare the char8_t without a problem. But i needed couple infos before doing something stupid.

couple questions:

1: I guess visual studio is not a compiler either, it is the MSVC that is, am I right?

2: What is a minGW?

3: Did you really created your own compiler?

George P wrote:
The thing with char8_t availability is it is an optional feature. An implementation is not required to make it available.

This is not true. Note that char8_t has the same size as char which means that in theory it could be larger than 8 bits.
The same applies to (u)int8_t as this is usually also typed to be (unsigned) char - which is annoying when trying to display a type of (u)int8_t as it displays as a char representation rather than a number as wanted!

 
printf("%i\n", c8t);



It works here as you're typing c8t to be int for display (%i)!

For std::cout you'd cast c8t to an int for display as an integer:

 
std::cout << static_cast<int>(c8t) << '\n';


std::cout treats char/unsigned char (and those types which are equivalent - int8_t etc) as a character and hence displays as a char. If you want a different display then you need to cast to another appropriate type.

The same applies to type char*. std::cout assumes this is a pointer to a c-style string and hence the c-string pointed to is displayed rather than the memory address.
Last edited on
Declare a char8_t variable with a character, say 'a', and VS vomits up an error with std::cout << c8t << '\n';. operator<< is a deleted function for a naked char8_t. It has to be cast to work.
std::cout << static_cast<char>(c8t) << '\n';

If'n you want numeric output:
std::cout << +c8t << '\n';

Or cast to an int.

VS and MinGW/GCC exhibits the same behavior with the other fixed width char types as well if I don't manipulate the variable with a cast or adding + prefix, operator<< is a deleted function for all 3 of the fixed width char types.

My C:B copy uses MSYS2's MinGW 12.2, so I know this isn't a VS-only issue.

This "jumping through hoops" is one reason why I never seem to get any of the fixed length char types to work without a lot of extra effort. Effort I usually forget to do and so I don't use the the fixed width char types.

Somewhere along the line the specifications for std::cout kinda fell by the wayside and ignored as the standard introduced the fixed width char types starting with C++11. If std::cout exhibited the same behavior with the fixed with char types as the regular char type, no need for a cast to display a character, that would be understandable and useful.

what is mingw

MinGW is a Windows port of the GCC compiler.
https://stackoverflow.com/questions/38252370/what-is-the-difference-between-gnu-gcc-and-mingw-arent-they-same
There are some patches done on GCC with MinGW to make it work better with Windows.

There are a couple of MinGW variants: MinGW, MinGW-64 & TDM-GCC.

If'n you don't do Windows you don't need to worry about this.

Since I do do Windows this is of vital importance to me if I want to use the GCC toolchains.

The current release candidate version of GCC is 12.2, released 19 August 2022.
Last edited on
Did you really created your own compiler?

No, I simply created a new "use this compiler" template in Code:Blocks and adjusted the template values to use the updated MinGW/GCC compiler I installed outside of C:B.

The latest release version of C:B is 20.03, released in March 2020, with an optional GCC version from before that time. GCC 8.1.0. GCC has had updates since.

https://wiki.codeblocks.org/index.php/Installing_a_supported_compiler

The thing is that char already works for handling UTF-8 essentially everywhere except Windows. And given that char8_t is so badly supported by the standard library and other libraries I don't see any compelling reason for using char8_t just yet.


If you're using a C library that accepts UTF-8 strings as a const char* parameter then I think you can just cast it:
1
2
std::u8string str = u8"Hej där alla vänner!";
SDL_Surface* textSurface = TTF_RenderUTF8_Solid(font, reinterpret_cast<const char*>(str.c_str()), color);
I think this should work thanks to the fact that char is allowed to alias other types.


P1389 is the "Standing Document for SG20: Guidelines for Teaching C++ to Beginners".
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1389r1.html

P1747 says "Don't use char8_t and std::u8string yet in P1389".
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1747r0.html
C++20 still has no tools to handle input and output with these types. Even the new {fmt} facilities doesn’t support it. There is even no good conversion tools for it (and even the existing conversions, like codecvt stuff, are deprecated since C++17).

The main usage of strings and characters is for input and output and C++20 still missing tools to do so with these types.

This paper suggests to remove the distinction between pre-C++20 and post-C++20 and reintroduce these types as soon as the proper tools are added (by SG16, hopefully for C++23).

P1747 got accepted.
https://github.com/cplusplus/papers/issues/508#issuecomment-524538538
Poll: We would like to see UTF-8 string-handling in education guidelines when there is support for UTF-8 string-handling support in C++. Unanimous consent."

I don't remember reading about any such tools that makes it easier to deal with char8_t/std::u8string being added to C++23 so maybe it's best to wait...
Last edited on
Peter87 wrote:
given that char8_t is so badly supported by the standard library and other libraries I don't see any compelling reason for using char8_t just yet.

Amen! Preach it, Brother! :D

Not every feature added to the standard is really needed or useful, as implemented. Especially when the support isn't really all there. The fixed width char types are a good case study.

UTF-8 support is something useful, when properly done. Windows, as usual, is like the proverbial cheese.* It stands alone when the rest of the computing world marching onward.

*https://writingexplained.org/idiom-dictionary/the-cheese-stands-alone
Last edited on
In practice, this seems to work.

1
2
3
4
5
6
7
8
9
10
#include <iostream>

int main()
{
    auto& u8cout = reinterpret_cast< std::basic_ostream<char8_t>& >(std::cout) ;

    const std::u8string u8str( u8"In theory, there is no difference between theory and practice."
                               u8"\n\tIn practice, there is." );
    u8cout << u8"anon: " << u8str << u8'\n';
}

https://coliru.stacked-crooked.com/a/bc72379c51590fcc
Stomping on a stream with a reinterpret_cast is just so hacker-ish.

Just reading that smells somehow.

http://wiki.c2.com/?CodeSmell

char8_t is a new basic integral type - not based upon another type (like say int8_t is based upon char). So std::cout etc won't work with char8_t as it's based upon std::basic_ostream<char> - and char8_t isn't a char (although char8_t can be assigned/initialised to a char).

As another way to defining a new stream, you can currently add new overloads to the existing so that you can use std::cout with char8_t based types:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <iostream>

std::ostream& operator<<(std::ostream& os, char8_t c8t) {
	return os << static_cast<char>(c8t);
}

std::ostream& operator<<(std::ostream& os, const std::u8string& u8s) {
	return os << reinterpret_cast<const char*>(u8s.c_str());
}

int main() {
	const char8_t c8t { 'z' };

	const std::u8string u8str(u8"In theory, there is no difference between theory and practice."
		u8"\n\tIn practice, there is.");

	std::cout << c8t << '\n';
	std::cout << u8str << '\n';
}


Similarly, AFAIK there's no in-built way of displaying u16string, u32string, char16_t and char32_t types.

Also char16_t/char32_t and wchar_t are different types (even though wchar_t is probably 16 bits on Windows and 32 elsewhere). Hence std::wcout which is for wchar_t based variables also can't be used for char8_t/char16_t/char32_t based output.

We have std::u8string, std::u16string, std::u32string, so why not std::u8cout, std::u16cout, std::u32cout etc etc etc. Once a new integral type is introduced (char8_t, char16_t, char32_t), then either all the existing needs to also work with these new types or new versions of all of the existing need to be introduced for these types.
C:B is easy to change/modify/update the compiler used. https://stackoverflow.com/questions/62011663/how-to-change-compiler-in-c-ide-codeblocks-or-dev-c


That's a useful link, it helped me to install a new compiler in my C:B. so thank you @George. but couple things to add, first it does not show how to update the compiler, only how to change/modify it. Second it does not show how to install the minGW64, I got that from other source.

Anyway thank you.

One other thing. Here:
You'd have to update the compiler C:B uses to at least v9 to use char8_t.
how did you know precisely that v9 is the least version needed to use char8_t, is there somewhere an information like: char8_t minimum g++ version is v9?
Last edited on
https://en.cppreference.com/w/cpp/compiler_support/20

Look for "Library support for char8_t." For GCC the minimum version is v9.
BTW, https://stackoverflow.com/questions/57402464/is-c20-char8-t-the-same-as-our-old-char

Unlike everyone else here, apparently, I agree with the proposed data type. If your compiler doesn’t support it, #define it somewhere as an alias for unsigned char and your code will magically compile. Just not with all the type guarantees that char8_t supplies.

Relatedly, I personally prefer to overload all I/O to read and write whole code points — meaning that when I read/write a single “character” (Unicode code point) I do it using char32_t.
Pages: 12