fixed length data types

Nov 23, 2022 at 10:36am

I've read that the size of data types like int might differ between systems.
My first question is, can someone provide an example of what goes wrong when a program expects an int is 4 bytes but it is only 2 bytes on another platform?
Another inquiry I had was connected to this. I know people handle this problem with typedefs as said in this article (https://www.scaler.com/topics/cpp/data-types-in-cpp/), such as having variables like u8,u16,u32 - which are guaranteed to be 8bits, 16bits, and 32bits regardless of platform - but my question is, how is this normally accomplished? (I am not referring to types from the stdint library; rather, I am interested how one may explicitly guarantee that some type is always, say, 32 bits regardless of platform?)

Nov 23, 2022 at 11:31am

Peter87 (11251)

Mobo01 wrote:
I've read that the size of data types like int might differ between systems.

char is at least 8 bits but is as good as always exactly 8 bits.

short and int are at least 16 bits but on modern computers I don't think you'll find anything other than short being exactly 16 bits and int being exactly 32 bits. You'll have to go back to the time of 16-bit computers to find 16-bit ints.

long is at least 32 bits. It's usually either 32 or 64 bits.

long long is at least 64 bits and I doubt you'll find it being larger than that anywhere.

Mobo01 wrote:
My first question is, can someone provide an example of what goes wrong when a program expects an int is 4 bytes but it is only 2 bytes on another platform?

If you write your code with the assumption that int is 32 bits (and can store values up to 2147483647) then you might run into problems if you recompile the same program on a platform where int is only 16 bits (and can only store values up to 32767). It could easily lead to integer overflows and result in the wrong values and undefined behaviour.

Example:

int a = 27000;
int b = 15000;
int avg = (a + b) / 2;
std::cout << avg << "\n";

This code should print 21000 but if int is 16 bits the expression (a + b) will "overflow" because the sum 42000 doesn't fit in a signed 16-bit integer.

It can also be a problem if you transfer binary data between programs running on different platforms (e.g. by using files or through an internet connection) and don't make sure to use the same number of bits at both the sender and receiver end.

Mobo01 wrote:
I am interested how one may explicitly guarantee that some type is always, say, 32 bits regardless of platform?

If you want to support all platforms (existing and non-existing) that the C++ standard supports then the simple answer is that you can't. The C++ standard is compatible with platforms where for example the smallest addressable unit is 16-bits which forces char to be 16 bits. In that case there would be no way to represent a 8-bit integer.

If you look at the specification for the <cstdlib> header you'll see that the standard says that the fixed-width integer types are optional, i.e. they do not need to be available on platforms where they are not supported. Other "platform-independent" libraries that come with such fixed-size integer typedefs often just assume that they can be supported because they know that all platforms that they care about supports them.

If you are very paranoid and want to support platforms that does not use 8-bit bytes, or you know you're working with some specialized hardware that use unusual sizes, then you can use a larger integer type if necessary. <cstdint> provides std::int_least8_t, std::int_least16_t, etc. These are not optional. You would then have to write the code in such a way that it doesn't assume the integers to be of a certain size. You can for example no longer rely on the wraparound behaviour of unsigned integer types to work the same as with a smaller fixed size integer type without doing additional masking.

Normally I think it's fine to just assume you can use the fixed-sized integer types. I don't see the advantage of using anything other than the ones in <cstdint> now that it has been standardized. The reason why so many libraries use their own typedefs probably has a lot to do with the fact that it was not standardized in C++ until C++11. In C it was standardized a bit earlier, in C99, but adoption has been slow in the past and many C libraries wanted (perhaps still wants?) to support older versions of the standard.

Last edited on Nov 23, 2022 at 2:06pm

Nov 23, 2022 at 11:40am

seeplus (6627)

If your code needs to know how many bits are used for a type, then you can use sizeof. Consider:

#include <iostream>

int main() {
	std::cout << "short int is " << sizeof(short int) << '\n';
	std::cout << "int is " << sizeof(int) << '\n';
	std::cout << "long int is " << sizeof(long int) << '\n';
	std::cout << "long long int is " << sizeof(long long int) << '\n';
}

Edit & run on cpp.sh

which for VS 64-bit displays:


short int is 2
int is 4
long int is 4
long long int is 8

where the value returned is the number of bytes used by the type. Note that sizeof(char) will always return 1.

https://en.cppreference.com/w/cpp/language/sizeof

Nov 23, 2022 at 11:54am

Peter87 (11251)

seeplus wrote:
If your code needs to know how many bits are used for a type, then you can use sizeof.

To get the number bits rather than the number of bytes you can multiply the result by CHAR_BIT (defined in <climits>).

#include <iostream>
#include <climits>

int main()
{
	std::cout << "char is " << (sizeof(char) * CHAR_BIT) << " bits\n";
	std::cout << "short is " << (sizeof(short) * CHAR_BIT) << " bits\n";
	std::cout << "int is " << (sizeof(int) * CHAR_BIT) << " bits\n";
	std::cout << "long is " << (sizeof(long) * CHAR_BIT) << " bits\n";
	std::cout << "long long is " << (sizeof(long long) * CHAR_BIT) << " bits\n";
}

Edit & run on cpp.sh

Note that when seeplus and I mention "byte" here we do not necessarily mean 8 bits. We mean the number of bits that are stored in a char. That's why I tried to avoid the word "byte" as much as possible in my previous reply to avoid confusion.

Last edited on Nov 23, 2022 at 12:13pm

Nov 23, 2022 at 12:26pm

seeplus (6627)

As sizeof(char) is always 1, the result from sizeof() might be more easily be treated as the number of 'chars' used by the type. As Peter points out above, the number of bits used in a char can be obtained from CHAR_BIT. Except for some 'exotic' hardware, this value is usually 8.

When types such as int8_t, int16_t etc are supported, these are defined by typedefs by the compiler writers as the appropriate basic char, int, long etc.

Eg for VS:

typedef signed char        int8_t;
typedef short              int16_t;
typedef int                int32_t;
typedef long long          int64_t;
typedef unsigned char      uint8_t;
typedef unsigned short     uint16_t;
typedef unsigned int       uint32_t;
typedef unsigned long long uint64_t;

For different architectures, these could be defined differently by the compiler writers to have the same meaning.

Also note that signed char, char and unsigned char are different types - that's why int8_t is defined as signed char and not just char.

Last edited on Nov 23, 2022 at 12:27pm

Nov 23, 2022 at 1:36pm

AbstractionAnon (6954)

Just to throw a curve ball,
Cray (and other super computers) support:
float complex 64 bits (each part is 32 bits)
double complex 128 bits (each part is 64 bits)
long double complex 128 bits (each part is 64 bits)
_float128 128 bits
_float128 complex 256 bits (each part is 128 bits)

Of course you chances of running into a Cray are pretty small unless you're at LLNL or LANL.

Nov 23, 2022 at 1:40pm

seeplus (6627)

yes - and the programmers that code for these type of computers are all most always well skilled in coding arithmetic operations for their computer.

Nov 23, 2022 at 1:43pm

jonnin (11494)

My first question is, can someone provide an example of what goes wrong when a program expects an int is 4 bytes but it is only 2 bytes on another platform?

it can be catastrophic:

for(int i{}; i < 100000; i++)
{
//... i is 65535.. i increments to 0, loop never ends...
}

--------------
but, lets look at it from a birds eye view. The 32 bit computer was standard by 1992 or so. While the operating system was still 16 bit, the hardware was there. Windows 95 and after were all 32 bit OS (or 64 later on). However, for some reason (probably efficiency, as the giant ints cost bus speed, memory, etc for no reason) compilers have not moved to 64 bit integer default. Your compiler is basically a 32 bit compiler that supports 64 bit when asked to (by using uint64 for example). Even the 1992 486 era, a double was 64 bits, by the way, FPU is unchanged for a long, long time in that regard (there are other changes, so many others).

what does that mean? It means that unless you are on exotic embedded or super computer hardware, its extremely likely that your compiler uses 32 bit int, 16 bit short, 8 bit char. It is unusual for code to target both mainstream PC / phone / etc AND embedded or super computer. Granted I am sure some supercomputer code is grabbed from existing web code same as anything else, but hopefully the people coding for it know to go over it and tidy up for their hardware if needed. It is relatively safe to assume the 32/16/8 setup but if you are doing something critical where it could be a big deal (mission to mars, medical devices, dangerous machinery (aircraft, whatever) you should verify this and fail to compile or execute. If your code needs to check, I also advise avoiding long and long long as those are more often weird than int/short/char. Also you really should use the sized ones (those also need checks, as they are 'at least this big' not 100% ensured).

Last edited on Nov 23, 2022 at 1:58pm

Nov 23, 2022 at 2:31pm

seeplus (6627)

Of course you chances of running into a Cray are pretty small unless you're at LLNL or LANL.

The only 'supercomputer' I've used was a CDC 7600 back in the late 1970's which had a 60-bit word size...

Topic archived. No new replies allowed.

C++

Forum

fixed length data types