Efficient bounds checking for signed integers

The use of unsigned integers has many problems but if we use signed integers for our indices and want to check that the index is in bounds we need to add another check to see if it is not negative. I don't mind the extra code but I would have hoped the compiler to be able to optimize this but apperantly that is not the case.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
bool in_bounds(int a, int b)
{
    return a >= 0 && a < b;
}






bool in_bounds(unsigned a, unsigned b)
{
    return a < b;
}
in_bounds(int, int):
  mov eax, edi
  not eax
  shr eax, 31
  cmp edi, esi
  mov edx, eax
  setl al
  and eax, edx
  ret

in_bounds(unsigned int, unsigned int):
  cmp edi, esi
  setb al
  ret
https://godbolt.org/g/tMCC7U

The problem seems to be that b could be negative. If b was known to be positive it should be safe to treat a and b as unsigned and do a single comparision.

1
2
3
4
5
6
7
bool in_bounds(int a, int b)
{
    if (b >= 0)
    {
        return a >= 0 && a < b;
    }
}
in_bounds(int, int):
  cmp edi, esi
  setb al
  ret
https://godbolt.org/g/613E1Z

The above assembly output is for GCC which seems to understand this. Clang doesn't.

Unfortunatly it seems to be very reluctant to do this optimization. If I make the b < 0 branch not UB it doesn't do this optimization even if b >= 0 is guaranteed to be true in the branch that does the comparison.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
bool in_bounds(int a, int b)
{
    if (b >= 0)
    {
        return a >= 0 && a < b;
    }
    return false;
}
in_bounds(int, int):
  xor eax, eax
  test esi, esi
  jns .L5
  ret
.L5:
  mov eax, edi
  not eax
  shr eax, 31
  cmp esi, edi
  mov edx, eax
  setg al
  and eax, edx
  ret
https://godbolt.org/g/FUVvUQ

Assuming I have a good reason for wanting the code to be as efficient as possible, what is the best way of writing this? I don't really like the way that I wrote it above because it might give a compiler warning and it doesn't work on all compilers. Using assert and casting to unsigned is probably better because it makes the intention clear and errors easier to find.

1
2
3
4
5
bool in_bounds(int a, int b)
{
    assert(b >= 0);
    return static_cast<unsigned>(a) < static_cast<unsigned>(b);
}

I think this is one of the things that the contracts proposal aims to fix, but then compilers need to be more brave and allow themselves to optimize based on the preconditions, otherwise you'll still have to do hacks like this.
Does this checking need to be done in normal operation or only in development?

The nice features about assert for checks are that (a) (as you say), it makes intention obvious; and (b) it can simply be turned off when required by putting
#define NDEBUG
in a header.

All mainstream fortran compilers have a (non-default) operation to check array bounds, but my experience there is that it significantly slows down code execution. I can't find comparable options for the c/c++ compilers that I use (though that may be my lack of knowledge).
I'm curious why b could be negative when checking bounds... if b is representative of the upper bound, wouldn't it usually be sourced from a const that you have full control over, or from a something.size() ? You might even be able to play these meta-games further and conclude b is never 0.

lastchance wrote:
Does this checking need to be done in normal operation or only in development?

That's a good point. Most cases were I would write such bounds checking would be in assert macros where performance isn't important.

The nice features about assert for checks are that (a) (as you say), it makes intention obvious; and (b) it can simply be turned off when required by putting
#define NDEBUG
in a header.

Yes, but unfortunately it doesn't allow the compiler to assume the assertion always holds.

All mainstream fortran compilers have a (non-default) operation to check array bounds, but my experience there is that it significantly slows down code execution.

It's not like I want every access to be checked. But when I do, it would be nice if the code for signed indices was as efficient as for unsigned indices. An argument for using signed integers that I have heard quite a few times is that it can make the code faster because the compiler can assume it doesn't overflow/wrap. I don't think it's the most important argument but it still ...

Anyhow, I would love if the standard library transitioned into using signed indices, perhaps in std2. It would make C++ a much simpler language to learn and teach.

icy1 wrote:
I'm curious why b could be negative when checking bounds...

That's the point. I know that b can't be negative, but the compiler doesn't, and therefore it generates code that is not as fast as it could have been.
what do you get for alternate versions?

bool in_bounds(int a, const register int b) //the variable suggestions sometimes help.
{
return a && a < b;
}

bool in_bounds(int a, const register int b)
{
return (b-a) < 0; //if a > b, this is false, if a = b, this is false, is this cheaper?
}

try coding it a couple of other ways too, see what you can think up. You may stumble over a gem. Also, how would YOU code this in assembler?

Peter87 wrote:
That's the point. I know that b can't be negative, but the compiler doesn't, and therefore it generates code that is not as fast as it could have been.


What I mean here is a sort of hybrid -- if you're always passing in size_t for b, for example, and never doing something like double indices from each side of the array:
1
2
3
4
bool in_bounds3(int a, unsigned b)
{
    return static_cast<unsigned>(a) < b;
}


Edit: this site doesn't support nested quotes?
Last edited on
jonnin wrote:
return (b-a) < 0; //if a > b, this is false, if a = b, this is false, is this cheaper?

Yes, it generates less assembly instructions. With Clang it's actually identical to my unsigned version.

icy1 wrote:
What I mean here is a sort of hybrid -- if you're always passing in size_t for b, for example, and never doing something like double indices from each side of the array

Not sure what you mean by "double indices", but if I use signed indices I would also want to use signed sizes because mixing signed and unsigned is problematic. https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es100-dont-mix-signed-and-unsigned-arithmetic


I realize that I have probably been worrying too much about this "issue". Bounds checking like this is not something you would normally want to do very often outside debug checks anyway. If you need it in time critical code you should probably rewrite it in such a way that it isn't needed. Most of the time it wouldn't be worth thinking about, but for something as widely used as std::vector, if they started to use signed indices, I would expect the at() function to use casts or something internally to make it no slower than for unsigned indices.
Last edited on
signed index is really useful now and then. consider... lets say you had a small integer, say +- 127, but you had 10 to the 1000 of the things to sort.
hmm.

long long sorted [300]; //bigger than needed, feeling lazy today
long long * magic = &sorted[150];


for(all the data)
magic[datavalue]++; //signed bucket sort. magic[-100] is ok!

This used to work. I haven't had to do it in a while, but I suspect it sets off all kinds of warnings today, but hopefully it does still work? IMHO all array types (including vector) should support this kind of thing -- its not handy often, but when its what you need, it should be there for you.

Not really a fan of automatic bounds checking though. Even if its 1 clock cycle, its a waste of time for most code. You need it in just a few places usually, user input hack defenses / buffer overflow protection as one good use of it.
Last edited on
Topic archived. No new replies allowed.