a[i] = i++ undefined behavior

Greetings!
Another question to the language subtleties.

As I have understood until this point (also thanks to another thread I posted here a long time ago), operator precedence has nothing to do with order of evaluation. In my textbook, the following is given as an example for undefined behavior:
1
2
  int i = 0;
  a[i] = i++;  // assume the array exists 


My reasoning:
[ ] and ++ have higher precedence than =, so it is clear that the expression can be understood as (a[i]) = (i++).
The = operator works as follows.
1. The return value of the expressions on both sides is computed. The return value of a[i] is a reference to the array element stored at a[0]. The return value of i++ is 0.
2. The return value of the right-hand side is written into a[i].
3. A reference to a[i] is returned as the value of the = expression.

The problem is that the side effect of i++ can be applied either between 1. and 2., between 2. and 3., or after 3. In other words, I don't know whether 0 was written to a[0] or a[1], and I don't know whether the return value of = gives a[0] or a[1].

Is this correct?
> Is this correct?
In short, no.
http://c-faq.com/expr/evalorder4.html
In C and C++, assignment is a side effect. (The primary effect is to return the value of the LHS.)

As the expression is a single value sequence, the question you should be asking is: which value of i is being used for a[i]?

The answer is: it is not defined; the expression has “undefined behavior”.


It could be argued that such an expression is a natural consequence of the language’s design and therefore should have its behavior defined somewhere, but, alas, the age of the C language (and, consequently, the massive existing codebase) makes doing that impractical.
I don’t like GeeksForGeeks — they often get things very wrong or oversimplify stuff. I’ve been filtering them out of my own search results for a while now.

Case in point: the referenced information about pre- and post-increment has a factually incorrect section:

Quoting from https://www.geeksforgeeks.org/pre-increment-and-post-increment-in-c/
Special Case for Post-increment operator: If we assign the post-incremented value to the same variable then the value of that variable will not get incremented i.e. it will remain the same like it was before.

No, it is UB.

Just because it works on whichever compiler the author of that article tried it on does not mean it is valid or that it will work on any other compiler, even compilers of the same family on the same OS.
Last edited on
Here's the similar question that we discussed 4 years ago:
http://www.cplusplus.com/forum/beginner/225215/#msg1029272

Tl;dr: Within an expression, operator precedence and operator associativity determine which operands attach to which operators. They don't have anything to do with which parts of an expression are evaluated first. There are an additional set of rules, maybe called the sequenced-before rules, that constrain the evaluation order of sub expressions so that the results make sense. In brief, the sequenced-before rules specify that certain expressions are required to be evaluated (value computation, initiation of side effects, or both) before others.
Last edited on
Duthomhas wrote:

the question you should be asking is: which value of i is being used for a[i]??


But is that not what I described in my post?
In other words, I don't know whether 0 was written to a[0] or a[1], and I don't know whether the return value of = gives a[0] or a[1].

The problem is I don't know when the side effect of ++ is executed, isn't it.
Last edited on
What I used is called a “rhetorical question” — one which is designed to guide your thinking. You are focused on when the side effect is executed — which is exactly the kind of thinking that gets people to make mistakes, like they did at GeeksForGeeks.

You should be focused on what is observable from the syntactic point of view, where it becomes clear that the question of «what i’s value is when it is accessed at subdivisions in time not quantified by the language standard» is not well-formed and thus invalid.

During the evaluation of a sequence, the implementation may access i at any time it likes, any way it likes, and as often or rarely as it likes. We, the programmers, are given no guarantee inside of that computational sequence, and we should not be trying to guess what the computer is doing during that computation. The only guarantees we have are:
  • some side effects are performed before the sequence is completely computed
  • some side effects are performed after the sequence is completely computed
All other side effects may occur at any time during the computation of the sequence.

Asking for further detail moves you past the language standard and into the internals of your particular compiler (“the implementation”), and is consequently worthless if you are not a compiler writer.

Hope this helps.
The problem is I don't know when the side effect of ++ is executed, isn't it.

And therein lies the problem for humans, compilers, even computing in general. That's why it's called UB - undefined behavior. And that's why you need to accept the facts, don't program the conditional in such a silly way, and above all move on to your next weird thread where you think there is something of value in progressing by using the tools the wrong way.
Another way to think about this is: One of the steps in compilation is the construction of an Abstract Syntax Tree (AST); a small example is a + operation, the + is a parent node, and the operands a and b are child nodes in the tree. There is no guarantee as to the order which a and b are evaluated, except for the sequencing rules that mbozzi mentioned.

https://en.cppreference.com/w/cpp/language/eval_order

Note that, in the UB section there is this, which was your motivation for the question, and is sometimes contradictory to the advice given so far (it depends on which standard the code is compiled against):

a[i] = i++; // undefined behavior (until C++17)

The rules changed with C++17.

However, in testing the above code with -std=c++20 -Wall -Wextra g++ (11.2) warns about possible UB because of sequence points:

https://godbolt.org/z/9MGqx46c1
1
2
3
4
5
6
7
8
9
10
11
12
#include <iostream>

int main()
{ 
    constexpr int SIZE {3};
    int i{0};
    int a[SIZE] = {1,7,13};
    a[i] = i++;
    for(int n{0}; n<SIZE;n++) {
        std::cout << a[n] << "\n";
    }
}


<source>:9:13: warning: operation on 'i' may be undefined [-Wsequence-point]
    9 |     a[i] = i++;
      |            ~^~


Program returned: 0
Program stdout

1
0
13


Note that if one compiles against c++11, the same diagnostic is issued, but now it is UB. So IMO it's best not to take oneself down this road fraught with danger.
Last edited on
I don’t like GeeksForGeeks


But isn't GeeksForGeeks correct for C++17 in view of TheIdeasMan above post and C++17? That Geeks post was last updated 19 Oct 2021 so covers C++17/20.

The current standard is C++20 - so unless stated any info about C++ on the web that doesn't specifically state the version - or posted before a version of c++ was released - is assumed to be correct for the current version.
Assumption is the mother of ...

Isn't it safer to assume that posts are written by someone, who hasn't had time to read current standard nor understand it?
:) :)
Thank you guys, in particular Duthomhas and TheIdeasMan.
It is clearer now!

PiF
Topic archived. No new replies allowed.