Behavior for pointer comparison with fun

Forum

Forum
General C++ Programming
Behavior for pointer comparison with fun

Behavior for pointer comparison with functors?

Dec 27, 2010 at 5:00pm

I would like to know how to implement the contains function from the following code in a portable manner.

#include <stddef.h>
#include <assert.h>

struct A {
    int x;
} a;

struct B {
    int y;
} b;

bool contains(const void *outer, size_t size, const void *inner);

int main()
{
    assert(contains(&a, sizeof(a), &a.x));
    assert(contains(&b, sizeof(b), &b.y));
    assert(!contains(&a, sizeof(a), &b.y));
    assert(!contains(&b, sizeof(b), &a.x));
}

Edit & run on cpp.sh

It can be used in the GC implementation in the following post:
http://lists.boost.org/Archives/boost/2009/04/150748.php.

I am not affiliated with the development effort. I just explore the code. Tentatively at this point, primarily for educational purposes. I imagine such functionality can be useful for other custom structures. Particularly allocators.

Now the prototypical solution for most practical purposes I believe is:

bool contains(const void *outer, size_t size, const void *inner)
{
    return (outer <= inner) &&
            (inner < (const char *)outer + size);
}

(The C-style cast is for readability in my snippets.)

The problem with the above code is that according to the standard, pointers neither pointing in the same array, nor pointing at instance members of the same object, are not comparable with predictable behavior. To take the imagination to its extreme (beyond rational thought), it may cause system reset or something.

Now, I have already asked a question in another forum regarding portable comparison of pointers, also in reference to their usage in STL. And I was pointed to the fact that the functors from <functional> provide total order on all primitive data types, including pointers. Indeed, the case of pointers is even treated exclusively by the standard.

So, the above function can be rewritten like this:

bool contains(const void *outer, size_t size, const void *inner)
{
    return less_equal<const char*>()((const char *)outer,
                                     (const char *)inner) &&
            less<const char*>()((const char *)inner,
                                (const char *)outer + size);
}

I use char* for a reason. I try to exploit the guarantee provided in the section Basic concepts/Types (from the standard). It is said, in summary, that the representation of an object x must be contained in the characters in the memory range between &x and (char*)&x + sizeof(x). Now, to me this implies, that if x has a member x.y, then its address (&x.y) must also be somewhere in this range. As a matter of fact, &x.y to (char*)&x.y + sizeof(x.y) must be a sub-range. What do you think of such interpretation?

Anyway, I come to my second issue. How do we interpret the anti-symmetry requirement of less_equal in its its total order? I guess it could be something like:
(*) ... less_equal<...>()(a,b) and less_equal<...>()(b,a) implies equal_to<...>()(a,b).

This is not saying much on its own. I mean, the equivalence classes of equal_to can take any shape, like cause collisions of everything in an STL container. But then, functors are required to be consistent with the built-in relational operators (<, <=). So in (*), equal_to<...>()(a,b) can be roughly approximated with a == b.

For pointers though, a == b is not always defined. How do we know for sure, that &a and &b from my snippet at the top are considered distinct according to equal_to? If you create a set<..> of pointers to objects allocated on the heap, for example, that you want to be compared with respect to their location in memory, how do you know for certain that the comparison will not turn out to be completely non-discriminating?

I know that in practice C++ is not used in such manner without platform knowledge. I am just asking if the portability for the second contains function can be judged clearly by the standard in its present condition. And if so, do you think that it is portable or not?

Thanks, and regards

Dec 27, 2010 at 11:52pm

ne555 (10692)

equal_to<...>()(a,b) can be roughly approximated with a == b.

http://www.cplusplus.com/reference/std/functional/equal_to/
It is operator==

For pointers though, a == b is not always defined

? If the types are distinct is compile error. else is like a xor b

pointers neither pointing in the same array, nor pointing at instance members of the same object, are not comparable with predictable behaviour. To take the imagination to its extreme (beyond rational thought), it may cause system reset or something.

IIRC it means that the result is useless, not that it could cause crash. (the results will vary between different runs of the program)

Dec 28, 2010 at 7:21am

simeonz (490)

@ne555:
Thanks for the response. You wrote:

http://www.cplusplus.com/reference/std/functional/equal_to/
It is operator==

And the standard asserts this also, but in General Utilities Library/Function Objects/Comparison there is clarification at the end:

For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not.

If the operators and functors were identical for pointers, then this clarification would not be. The way I interpret it, the functors are something like extensions of the respective operators. You also wrote:

For pointers though, a == b is not always defined
? If the types are distinct is compile error.

I agree.

else is like a xor b

Depends on how you look at the notion. Do you mean that two pointers are equal exactly when their binary representations are identical? Is this formal?

IIRC it means that the result is useless, not that it could cause crash. (the results will vary between different runs of the program)

May be you are right. But it is a bit hard for me to judge. The actual statement is:

If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of p<q, p>q, p<=q, and p>=q are unspecified.

I guess saying "unspecified" result is different from saying undefined behavior. The precise meaning is probably given somewhere in the text's introduction. I guess I'll have to make a search to find out at some point.

The point is that p == q and equal_to<..>()(p, q) are not identical for pointers. Further, I am curios if any of the following assertions are true according to the standard:

1
2
3

int a; int b;
assert( &a != &b );
assert( not_equal_to<int *>()(&a, &b) );

I know that objects have non-zero size. And I think, there might have been something about their locations being distinct in some sense. But I have to dig deeper to see how this works exactly. I want to know if something is guaranteed about the operators and the functors.

Regards

Dec 28, 2010 at 12:30pm

Duthomhas (13276)

There really isn't any truly portable way, because what you are trying to do is non-portable.

Dec 28, 2010 at 12:57pm

simeonz (490)

@Duoas
In principle, a test for containment of some memory (object) within some other memory (object) should have, I think, portable semantics. As for whether C++ provides facilities for such test is a different matter. Apparently most people think that there is no such facility in the language, even implied by the standard.
I again state my main reason to suspect otherwise: the specification of the memory layout in the section Basic concepts/Types.

I don't want to sound stubborn as a mule, but is the conclusion of all that the standard clearly implies no such facility, accidentally or deliberately?

Dec 28, 2010 at 7:21pm

ne555 (10692)

s: a == b is not always defined
n: is like a xor b
s: Depends on how you look at the notion. Do you mean that two pointers are equal exactly when their binary representations are identical? Is this formal?

Sorry, it was a bad choice of words. I mean that it will yield a result (that will be the same as long as they don't change) (true if the stored values represents the same memory address)
I guess that the pointer represent as segment:offset. You could point to the same address with different segments, so the bit to bit comparison will be incorrect

About unspecified result: http://stackoverflow.com/questions/3455706/the-order-of-data-in-memory

So while you don't know if std::less<void*>(&gFirst, &gSecond) is true or false, you are guaranteed:

std::less<void*>(&gFirst, &gSecond) ==
    std::greater<void*>(&gSecond, &gFirst);
std::less<void*>(&Data::First, &Data::Second) ==
    std::greater<void*>(&Data::Second, &Data::First);

I'm confused about the operator< be different with less<>()

Last edited on Dec 28, 2010 at 7:22pm

Dec 28, 2010 at 9:37pm

simeonz (490)

ne555, I am glad to see that there has been some more public interest.

I agree with your last comments. The segment:offset interpretation is excellent and practical. Suppose that the code uses only the offset to compare the pointers, because the objects will be in the same segment if they are constituents of the same composite structure and the standard demands they should be in order to have predictable results. Then pointers to different objects may turn out equal as long as their offset in the respective segment is the same.

This is exactly why I fail to understand the requirement for "total order" for functors. Total order usually implies that the relation for equivalence has singleton equivalence classes. That is, we expect that equal_to(&a, &b) is true, only when &a exactly coincides with &b. But, what does it mean for two pointers to coincide. To be binary identical, or to point at identical locations in memory (the same objects), or to be identical according to the == operator, which we know is sometimes unspecified? I don't know how to interpret the total order thing effectively in practice and I think that this point actually deserves some clarification.

I will try now to be more specific about my other point, i.e. the type representation stuff that I try to exploit in my solution. First, I want to share two code snippets taken directly from the standard:

#define N sizeof(T)
char buf[N];
T obj; // obj initialized to its original value
std::memcpy(buf, &obj, N); // between these two calls to std::memcpy,
                           // obj might be modified
std::memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type
                           // holds its original value

, which basically says that a snapshot of an object can be stored and restored by extracting and overwriting a memory range associated with it. And the second:

T* t1p;
T* t2p;
// provided that t2p points to an initialized object ...
std::memcpy(t1p, t2p, sizeof(T));
// at this point, every subobject of trivially copyable type in *t1p contains
// the same value as the corresponding subobject in *t2p

, says that swapping the contents of objects can be performed by exchanging the contents of their assigned memory ranges.

The important point is that the bounds of those memory ranges are computable, as defined by the standard.
Now, say I have a structure:

struct A {
    X x;
    Y y;
    Z z;
} a;

The standard says (quite explicitly) that the memory for a is situated between &a and (char*)&a + sizeof(a). This memory contains the representation of a and therefore also the representations of a.x, a.y, and a.z. But the memory where a.y is stored is similarly situated in the range &a.y to (char*)&a.y + sizeof(a.y). This means that the latter is sub-range of the former in the sense:

1
2

(char*)&a.y == (char*)&a + m
(char*)&a + sizeof(a) == (char*)&a.y + sizeof(a.y) + n

, where m and n are suitable positive integers. Now, it is simple mathematical fact that if p = q + m, m > 0, then p > q. But I am not sure if this holds for C++ pointers. In practice, it probably does. But than why is the standard not explicit about the result from comparing pointer to some structure and pointer to element of the structure? More than this, as it is, it leaves the case explicitly unspecified result.

Otherwise, this would mean that (char*)&a.y > (char*)&a and (char*)&a.y < (char*)&a + sizeof(a). Than, the code for my contains function would work. At least with functors, it would.

Last edited on Dec 28, 2010 at 9:53pm

Dec 29, 2010 at 12:05am

Duthomhas (13276)

(My brain is on overload tonight so I haven't read much more here... sorry.)

You are only guaranteed to have pointer comparisons work when the pointers are already known to point to the same section of memory.

In practice, however, you can typically assume that your client's hardware will accept pointer casts friendly to your purposes.

Remember, there is the Standard which does not change, and hardware, which does. The standard leaves some things not explicitly stated to accomodate hardware variations. (Whether doing that was useful and successful is a different matter.)

On glancing through, I saw something about segmented memory. When using memory, it is incumbent upon the client (program) to handle it properly. Hence, if you are using offset values only, then assume that they point to the same segment. Otherwise the user is applying to a larger memory model and should be using properly managed large or huge FAR pointers.

I could be wrong.

Dec 29, 2010 at 8:49am

simeonz (490)

@Duoas, I understand that the topic of pointer comparison is designed flexible by necessity. But my questions concern specific parts from the standard. In summary:

1) The standard says that an object should be represented in memory starting from the location returned from the address-of operator and allowing of up to sizeof characters. This includes, for structures and classes, all member variables. (This, I think, is stated clearly.) Does this mean, that if some object is member of another, the former's address will be between the boundaries of the latter's memory (with respect to pointer comparison)? Here, I can allow detecting containment when there isn't, but not reporting no containment when there actually is.

2) The standard says that functors provide "total order". Does this guarantee that pointers to different objects are considered different (not equal_to), or just binds the various functors in certain axiomatic ways? The latter (on its own) would be somewhat less useful in practice. Like the axiom, when both lower_equal and greater_equal are true, than you are equal_to. Ok, that guarantees some correctness. But if you allocate many objects on the heap, and their addresses are all equal_to you can not effectively use them (the pointers) in ordered STL containers.

Point 1 will not be false positive anymore when combined with point 2. Then the code for contains can be made portable (with functors, address-of and sizeof). Otherwise, IMHO it can not.

Regards

Last edited on Dec 29, 2010 at 9:03am

Topic archived. No new replies allowed.