Are null terminator automatically added by a compiler?

I have a c style string that I deliberately does not allot space for the null terminator?

I then tried comparing it to a string where a nulll terminator is added but to my surprise they are equal.

it means that my compiler mingw added the null terminator automatically.
Could I say that this is consistent in all compilers?

What if I used an arduino compiler?

Based on my reading we should always set the null terminator in any c style string but it looks like there is no need (e.g. data[lastIndex] = '\0')

Can somebody please enlighten me?


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include<cstring>

int main()
{
    char data[5];

    data[0] = 'h';
    data[1] = 'e';
    data[2] = 'l';
    data[3] = 'l';
    data[4] = 'o';
    
    // std::cout << data;
    char another_string[] = {'h','e','l','l','o','\0'};
    int compare = std::strcmp(data, another_string); 
    std::cout << compare << std::endl;; // why 0?

    // tried printing
    int i = 0;
    while(i < 7){
        int ascii_counter = data[i];
        std::cout << data[i] << " :: " << ascii_counter << std::endl;
        i++;
    }
    return 0;
}


The output by the way in my workstation is like this
0
h :: 104
e :: 101
l :: 108
l :: 108
o :: 111
o :: 111
:: 0

From my output I set it to 7 so that it is larger than the data array length, the letter 'o' is printed twice and the last value indeed is a 0 or null terminator.

So how does strcmp value becomes 0, if this is the case?
I really am confused. Thanks!
Last edited on
The way you created the first char array it isn't a C string, a null terminator ('\0') was not added.

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <iostream>
#include <cstring>

int main()
{
   // this adds the proper C string null terminator automatically
   char c_str[] = "Hello!";

   std::cout << c_str << '\n';

   std::cout << "The sizeof c_str: " << std::strlen(c_str)
             << '\t' << sizeof(c_str) << '\n';
}
Hello!
The sizeof c_str: 6     7

Better in C++ code to use C++ std::string, much easier and less messy:
1
2
3
4
5
6
7
8
9
10
11
#include <iostream>
#include <string>

int main()
{
   std::string str = "Hello!";

   std::cout << str << '\n';

   std::cout << "The sizeof str: " << str.size() << '\n';
}
Hello!
The sizeof str: 6

https://www.learncpp.com/cpp-tutorial/introduction-to-stdstring/

FYI, there is an output tag that preserves how the output looks (see item 2):
https://cplusplus.com/articles/z13hAqkS/
Last edited on
> it means that my compiler mingw added the null terminator automatically.
The compiler didn't add a \0 automatically.

The \0 was already there in memory. Maybe by accident or maybe by design.
Many operating systems erase memory to zeros before starting a program, so if you don't write anything to some memory, chances are it will read as zero.

> Could I say that this is consistent in all compilers?
Well all compilers are consistent in that they wouldn't have written data[5] = '\0' for you.
But as for whether you'd always get lucky and find a \0 just in the right place, there is nothing consistent about that.

> What if I used an arduino compiler?
Same thing - whether it works or not is just dumb luck.

> the letter 'o' is printed twice and the last value indeed is a 0 or null terminator.
> So how does strcmp value becomes 0, if this is the case?
Well the compiler knows that the memory associated with data[5] doesn't exist as far as your program is concerned. So it's free to use that memory for any other purpose as it sees fit.
Anything after line 16 could have changed that memory location (like declaring additional variables), so by the time you printed that location, it was no longer a \0.

> Based on my reading we should always set the null terminator in any c style string
That's generally true if you use the double quote route to creating your string constants.
If you're doing single character assignments or single character initialisers, then you're on your own and you have to make sure you add the \0.

But even double quoted strings have a trap.
1
2
3
4
5
6
7
8
9
10
11
$ cat foo.c
int main ( ) {
    char a[] = "test";  // always has a \0
    char c[4] = "test"; // doesn't have a \0 (valid only in C, broken in C++)
}
$ gcc foo.c
$ g++ foo.c
foo.c: In function ‘int main()’:
foo.c:3:17: error: initializer-string for array of chars is too long [-fpermissive]
    3 |     char c[4] = "test"; // doesn't have a \0 (valid only in C, broken in C++)
      |                 ^~~~~~



This is what's called undefined behaviour (UB).

You broke the rules by passing a non-null terminated string to strcmp so all bets are off.


std::strcmp https://en.cppreference.com/w/cpp/string/byte/strcmp
The behavior is undefined if lhs or rhs are not pointers to null-terminated strings.


Undefined behavior https://en.cppreference.com/w/cpp/language/ub
Renders the entire program meaningless if certain rules of the language are violated.
Thank you all for such great answers! I am really learning a lot.

This is what's called undefined behaviour (UB).

You broke the rules by passing a non-null terminated string to strcmp so all bets are off.


Based on the answers above I think I understood that I am running into an undefined behavior clearly.

My requirement is like this actually, I am programming a microcontroller board that waits for inputs from my sensor. I cannot used c++ string as the memory is low so I am manipulating the c style char arrays.

I have to allot a certain amount of buffer data from my sensor read such that every input that I got is placed in my buffer array. Would it make sense if I declare my data buffer like this so that I won't have to take care of adding the null character terminator.

1
2
3
4
5
6
7
8
9
10
11
// rather than this
char my_data[7];
// set the value to empty string so that the null terminator is added automatically
char my_data[7] = "";
// or set each values to 0 manually
char my_data[7] = 0;

//later in the code, need to set the value on each of my char array
//my_data[0] =  'A'
//my_data[1] =  'B'
//my_data[2] =  'C' 


Because after populating my character array, there would be lots of string manipulations like stcmp, strlen etc.

Is my thought thinking correct? Or there is a better alternative.

Thanks!

Last edited on
 
char my_data[7] = {0};


will set each element of my_data to 0.
Last edited on
will set each element of my_data to 0.


Ooops thanks for correcting. I just type it in.

So is this a much better alternative so that I won't have to worry about that null terminator at the end?

I cannot initialize my_data to some value as it would come from my sensor read.
You can do either
 
char my_data[7] = "";
or
 
char my_data[7] = {0};

If you're using C++ (but not C) you can leave out = and/or 0 from the second syntax if you want.
 
char my_data[7]{};
Last edited on
Thank you all for always being helpful!
Topic archived. No new replies allowed.