How to know a char* is an array?

Forum

Forum
Beginners
How to know a char* is an array?

How to know a char* is an array?

Jul 27, 2020 at 9:10pm

Hello I am familiar with C#/Java and am reading ATourofC++

So far so good but I came across this :

"it is often wise to check that a pointer argument that is supposed to point to something, actually points to something:

 int count_x(char∗ p, char x) 
// count the number of occurrences of x in p[]  
// p is assumed to point to a zero-terminated array of char (or to nothing) 
{ 
if (p==nullptr) return 0;
 int count = 0; 
  for (; p!=nullptr; ++p) 
   if (∗p==x)
     ++count;
 return count; 
}

I get that arrays are just sequential spots in memory until reaching an inevitable /0 but is this really the "best practice" way of doing things in C++?
It just seems dangerous? I would feel better passing in like char[]* p or something else instead you know?

Last edited on Jul 27, 2020 at 9:13pm

Jul 27, 2020 at 9:30pm

Ganado (6836)

Arrays in C/C++ are really primitive, and do not have as many built-in capabilities like those in C#/Java. For this reason, there are various wrappers around arrays in the C++ language's standard library.

I get that arrays are just sequential spots in memory until reaching an inevitable /0 but is this really the "best practice" way of doing things in C++?

(1) There is no "inevitable" '\0' (null character) at the end. If you pass in an array that is not null-terminated, it will go past the end of the array's buffer. Just wanted to make that more clear. It is the caller's responsibility to do so.
(2) Best practice in C++ is to not have to use c-style strings, and to prefer std::strings instead under normal circumstances. See further down.

It just seems dangerous

Yes, working with pointers, in general, is error-prone. Prefer to use the standard library's string type.

char[]* p

I'm not sure what you mean by this. I guess you mean "pointer to array", but that's not how arrays work in C++, unlike C#/Java land. To answer your title question:

How to know a char* is an array?

You can't.

int func(char arr[]); and int func(char* arr); are the same function signature.
Because an array degrades into a pointer when being passed into a function, there is no way to know whether or not p is truly just a pointer to a single char, or a pointer to a null-terminated char array.

However, the good news is just like how C# has a string type in its library, so does C++.
In C++, a string's length is simply string.length().

// Example program
#include <iostream>
#include <string>
using namespace std;

int main()
{
    string str = "Hello";
    cout << str.length() << '\n';
}

Edit & run on cpp.sh

If you want to make it clear that char* p should be an array, I would call it "arr" or "str" instead of "p" at the very least, and make the signature look like:

int count_x(char str[], char x)
{
    // ...
}

Now, it is expected that char str[] should be a null-terminated since you're saying it's a str (string), despite the actual logic being the same.

Edit:

1
2
3

  for (; p!=nullptr; ++p) 
   if (*p==x)
     ++count;

This is wrong. You should not be checking if p is a nullptr in the loop. If p was not a nullptr to begin with, incrementing it will never make it a nullptr.

___________________________________________________

Example:

#include <iostream>
#include <string>
using std::cout;

// count the number of occurrences of ch in p[]  
// p is assumed to point to a zero-terminated array of char (or to nothing) 
int count(const char* p, char ch) 
{
    if (p == nullptr) return 0;

    int count = 0;
    for (; *p != '\0'; ++p) // loop until we're pointing to a null character
    {
        if (*p == ch)
            ++count;
    }

    return count; 
} 

int main()
{
    // "string literals" are implicitly null-terminated
    
    cout << count(nullptr, 'x') << '\n';
    cout << count("hello", 'l') << '\n';
    
    char arr[] = "mississippi";
    cout << count(arr, 's') << '\n';
    
    // individual characters that form an array are not implicitly null-terminated
    char arr2[] = { 'a', 'b', 'c', 'd', '\0' };
    cout << count(arr2, 'f') << '\n';  
}

Edit & run on cpp.sh

Note that my signature is const char*. "const" means "read only". This means "pointer to const (read-only) char", which is what the signature needs to be if you ever pass in string literals to the function, because you cannot modify string literals.

Last edited on Jul 27, 2020 at 10:04pm

Jul 30, 2020 at 4:57pm

Havie (4)

Hey Ganado,

Thanks for the reply- I really appreciate the help.

I've got a few follow up questions if thats okay.

This is wrong. You should not be checking if p is a nullptr in the loop. If p was not a nullptr to begin with, incrementing it will never make it a nullptr.

I'm a bit confused on this, you're saying if p had a value, incrementing it will just jump from object to object to object in memory and never find the '/0' ?

In your example you change this
from:

for (; p!=nullptr; ++p)

to :

for (; *p != '\0'; ++p)

I thought nullptr and '/0' were the same thing?
secondly,
Forgive me for this very novice question, but if p is a pointer to a char, what does your code do when you are adding a * to it again during the for loop? You are getting the value at that memory address right? Where as the original code is just getting the actual memory address itself? Can a memory address be a nullptr? Is that what you're saying can never happen?

I'm just a bit confused here, because I took this from a book I'm reading which was highly recommend. I didn't think it would have an error this early ?

Thanks for your help

Last edited on Jul 30, 2020 at 4:58pm

Jul 30, 2020 at 5:44pm

jlb (4973)

I thought nullptr and '/0' were the same thing?

No they are not necessarily the same. The character '\0' is sometimes called the end of string character, because every C-string must be terminated by this character. And note that an array is not necessarily a C-string since a C-string must be properly terminated.

The keyword nullptr denotes the pointer literal. It is a prvalue of type std::nullptr_t. While a nullptr "may" implicitly convert to zero, you can't convert zero to a nullptr.

Can a memory address be a nullptr?

No.

I'm just a bit confused here, because I took this from a book I'm reading which was highly recommend.

What book, book version, paragraph, sub-paragraph, and Author?

Jul 30, 2020 at 5:52pm

zapshe (1983)

'\0' is the character at the end of a character array which lets you know you've gone through the array.

The issue with p!=nullptr is because incrementing p will never give you a nullptr. You'll simply eventually get a memory address out of the range of the array that your program shouldn't have access to, and then the program will crash.

The nullptr check sees if the memory address of p is valid or not (not whether or not your program should have access to it).

Look at this program:

int main() 
{
	char *p = nullptr;

	std::cout << ((p == nullptr) ? "True" : "False") <<
		'\n' << ((p == '\0') ? "True\n" : "False\n");

	char q[] = "Black";

	std::cout << ((&q[5] == nullptr) ? "True" : "False") <<
		'\n' << ((q[5] == '\0') ? "True" : "False");
}

True
True
False
True

'\0' and nullptr are not the same thing. Notice how on the second checks, I use & for checking nullptr, since I need to get the memory address at that element to check it against nullptr.

However, when checking '\0', I'm actually want that element, since it will contain a valid memory address but will hold the value '\0' to signal that you've already reached the last character in the array.

Jul 30, 2020 at 7:51pm

Ganado (6836)

Havie, from a search of "A Tour of C++" and "count_x", it appears that some editions of the A Tour of C++ book do in fact have incorrect code examples on pages 11-12.

https://stackoverflow.com/questions/22237408/buggy-code-in-a-tour-of-c-or-non-compliant-compiler
https://stackoverflow.com/questions/45390179/how-this-char-array-should-be-a-tour-for-c-example
https://www.stroustrup.com/Tour_printing2.html

I found a digital copy of A Tour of C++, 3rd printing, January 2015.
The example on page 11 correctly shows the comparison being
for (; *p!=0; ++p)
If you are using a different edition, perhaps it is mentioned in the errata for that edition.

I hope zapshe's explanation for why it's wrong is understandable for you.

____________________________________

Also, nullptr's type is a pointer (specifically nullptr_t), while '\0' is a char literal.
In other words, use nullptr when you're making a comparison with a pointer, and use '\0' when making a comparison with a char.
In bother cases, you can also just do if (thing) instead of if (thing != '\0'), as it means the same thing, but the latter is a bit more explicit.

Last edited on Jul 30, 2020 at 8:19pm

Jul 30, 2020 at 8:16pm

jonnin (11494)

I thought nullptr and '/0' were the same thing?
side note on this:
zero is defined in c++ as a constant in a dozen + places.
null and nullptr will both resolve to integer zero. so will 0, 0.0**, '/0', false, and many other things.
sure, this works:
int * ip = false;
but its gibberish. It makes the code confusing to read to do that, and probably triggers a warning and possibly an error without a cast on strict compiler settings, but on loose settings or with a cast, it works because integer zero is integer zero at the end of the chain down at the assembler level.
I highly suggest you do not do things like this. It serves no purpose to mix and match these things most of the time (and when it does, make a comment on why). Use the constant name that applies to what you are doing: if its a pointer, set it to nullptr not false or '\0'; if its a character, set it to '\0' not nullptr, and so on.

** 0.0 if you look at the raw bytes of the double is 0x000000.... all zeros. So it can cast out as integer in raw byte format even though its a double. It also automatically casts to 0 as integer if you assign int x = 0.0 to yield integer zero. you get a warning, of course.

Last edited on Jul 30, 2020 at 8:18pm

Jul 30, 2020 at 8:22pm

Ganado (6836)

jonnin wrote:
null and nullptr will both resolve to integer zero. so will 0, 0.0**, '/0', false, and many other things.

One small correction: A benefit of nullptr is that it isn't implicitly the integer zero.
int a = nullptr; will not compile.
int a = NULL; will compile (with a warning).

Last edited on Jul 30, 2020 at 8:22pm

Jul 31, 2020 at 3:43am

jonnin (11494)

right. nullptr will work with a cast, though. Several of the things I said require a cast with any kind of sane compiler flags, but even so, zero is zero.

Jul 31, 2020 at 7:46am

keskiverto (10425)

Havie wrote:
In your example you change this from: `for (; p != nullptr; ++p)` to : `for (; *p != '\0'; ++p)` I thought nullptr and '/0' were the same thing?

Wrong focus/question. The problem here is that p and *p are not the same thing.

Although, you did have the right question too:

Havie wrote:
if p is a pointer to a char, what does your code do when you are adding a * to it again during the for loop? You are getting the value at that memory address right? Where as the original code is just getting the actual memory address itself? Can a memory address be a nullptr? Is that what you're saying can never happen?

The * in this context is pointer dereference operator.

The p gives us the value of pointer. An address. Address can be nullptr. The null address is known to be invalid. A pointer that points to nowhere.
Anything non-null is assumed to be an address, although we can't know for sure.
We can test whether address is null, but we can't tell for certain what is in non-null memory location.

The *p gives us the value of the object that the pointer points to. It is an error to derefence null pointer.

Accessing element k of an array: p[k]
With different syntax: *(p+k)

[Edit]
One more look at for (; p != nullptr; ++p)
First, lets add names so I can refer to them: for (char* tmp = p; tmp != nullptr; ++tmp)

The ++tmp is shorthand for tmp = tmp + 1
Therefore, after N iterations: tmp == p + N

(Assuming nullptr == 0) The loop will end after N iterations, IF p + N == 0
Is there such N that p+N would equal to 0? Ever?

Last edited on Jul 31, 2020 at 9:22am

Jul 31, 2020 at 3:09pm

Havie (4)

Wow thanks for all the help,
this is making a lot more sense now.

Crazy I happened to pick the one snippet in this book that had errors in it. Thanks for finding that Ganado .

One more look at for (; p != nullptr; ++p)
(Assuming nullptr == 0) The loop will end after N iterations, IF p + N == 0
Is there such N that p+N would equal to 0? Ever?

Well no right? two reasons,
1) because as Ganado said before- if p does not start a nullptr it will never be a nullptr right? it would just keep incrementing ? (although im not sure what the end of memory looks like?)
2) cant we not assume nullptr ==0 ? While all of these different versions of null,nullptr,'/0' may somehow evaluate to 0, I think I'm going to attempt to stay away from this logic and just use nullptr when working with pointers, null when working w objects, and '/0' when working with chars/strings if that makes sense

But, I do get your point, if we are incrementing and running through this loop even once, p+ something is never going to be zero.

You'll simply eventually get a memory address out of the range of the array that your program shouldn't have access to, and then the program will crash.

The nullptr check sees if the memory address of p is valid or not (not whether or not your program should have access to it).

Did you mean to imply that that the memory out of range will be for the entire program? Because this sort of sounds like 1 index past the array it will crash, whereas im assuming you can increment far past the end of the array before a crash.

Also, Is this saying that lets hypothetically say your entire computers memory was fixed, like 0-100, and something is keeping track of your program, limiting it to 0-20, and if you went to 21 the program crashes despite there being more memory out there?

Last edited on Jul 31, 2020 at 3:13pm

Jul 31, 2020 at 3:55pm

jonnin (11494)

1 past index may or may not crash you.
it will, however, damage another variable's value which in turn can cause a crash itself.

that aside, the worst thing that can happen in a program is not a crash. The worst bugs are the ones where it appears to work fine ... so you sell it, release it, whatever... and then it does not work anymore, randomly crashing on some systems, corrupting data on others... debug that?!

the OS checks every access to see if the memory you tamper with belongs to your program. It gives you a large block, though, so if you go one past, it may not crash etc. Nothing tracks it down to the byte level. Old OS like dos didn't track at all, you could hack other programs easier and inject virus easier or steal info easier etc. I hacked several games that way, by running a program that cut into the memory of the game while it was live.

so you are not wrong, one past is not assured to crash. But its assured to be a nightmare later if it does not.

Last edited on Jul 31, 2020 at 3:58pm

Jul 31, 2020 at 4:57pm

zapshe (1983)

If your program accesses memory that it shouldn't, its behavior is going to be undefined. It may only take going 1 index more, or 10 more, as jonnin said.

Also, Is this saying that lets hypothetically say your entire computers memory was fixed, like 0-100, and something is keeping track of your program, limiting it to 0-20, and if you went to 21 the program crashes despite there being more memory out there?

The crash will happen because the operating system will realize that the program is going out of its territory, or changing the memory caused an issue with the program somewhere.

Jul 31, 2020 at 5:51pm

mbozzi (3943)

if p does not start a nullptr it will never be a nullptr right? it would just keep incrementing ? (although im not sure what the end of memory looks like?)

You cannot even increment a pointer except in very specific conditions. In particular, you cannot keep incrementing past the end of an array. A program that does this is wrong by definition: it exhibits undefined behavior.

Only pointers within the same contiguous block of memory are totally ordered. As such, there is no way to represent the "beginning" or "end" of memory. Your program can only represent "beginning of this object" and "one-past-the-end of this object", and everything in between.

A program that exhibits undefined behavior is wrong, but it does not necessarily crash. Undefined behavior exists because it allows the compiler to optimize your code under the assumption that it never occurs. This is sometimes a significant reason why C++ code executes more quickly than e.g., Java code.

cant we not assume nullptr ==0

You can assume that nullptr and 0 are neither equivalent nor typically interchangeable. Use 0 to represent zero and nullptr to represent the null pointer.

Zero should never have been used as the null pointer constant. This design issue was inherited from C, where the opportunities for this to cause problems are more rare.

Also, Is this saying that lets hypothetically say your entire computers memory was fixed, like 0-100

There exist systems like this, but on such systems there's nothing watching your program. These systems provide lots of guarantees beyond the minimums in the C++ standard, and so they represent a bit of a special case for programmers. You can rely on those special guarantees to write code that breaks C++'s rules (e.g., by writing the 21st byte), and still obtain a program that works (on that system).

Such systems are typically freestanding. Memory on desktop machines is not nearly as simple.

Last edited on Aug 1, 2020 at 4:02am

Jul 31, 2020 at 9:02pm

Ganado (6836)

can we not assume nullptr ==0

It's about expressing intent, and making code that is easily understandable. If you are working with pointers, use nullptr. If you are working with integers, use 0. Yes, the language will let you mix them in some cases, but it doesn't mean you should.

Last edited on Jul 31, 2020 at 9:08pm

Aug 4, 2020 at 4:07pm

Havie (4)

"cannot" or "should not" ?? I'm under the impression here that you can be increment any pointer you want (as long as its not the nullptr) and jump/leap around memory like a lunatic in C++. I feel like we just went over how easy it is to increment past the end of an array in C++, and the programmer is responsible for keeping track of lengths of array as there is no .length/.size function for arrays in C++ right?

However, I understand why you shouldn't do this.

Aug 4, 2020 at 4:20pm

zapshe (1983)

"cannot" or "should not"

The last part of his statement is, "do[ing] this is wrong by definition: it exhibits undefined behavior."

Therefore, clearly you can but just shouldn't.

jump/leap around memory like a lunatic in C++

Imagine me coding stack variables during function calls in Assembly ;)

Aug 4, 2020 at 4:48pm

mbozzi (3943)

"Should not" works.

You can run pointers through storage allocated to your program however you want - but you can't use arithmetic to take a pointer outside of that region.
§8.7[expr.add]/4
https://timsong-cpp.github.io/cppwp/n4659/expr.add#4

there is no .length/.size function for arrays in C++ right?

The length of an array -- excepting arrays of unknown bound -- is encoded in type of the array. std::size(my_array) inspects the type of the array and gives you that information back. std::size didn't exist when A Tour was written:
https://en.cppreference.com/w/cpp/iterator/size

Unfortunately, C-style arrays are hard to work with. They have a habit of converting themselves into pointers to their first element. This behavior was inherited from C's predecessors, BCPL and B. C++, fortunately, has std::array, which is easier to work with (and has a .size()).

Topic archived. No new replies allowed.