UB in STL

It bugs me that parts of the Ranges library allows UB, or doesn't have bounds checking. Given the recent criticism of C++ being an "unsafe language", I wonder why this UB and other problems exists in such a recent thing as Ranges. Sure, it is easily possible to avoid it, but correct behavior is not enforced.

We all know that uninitialised variables are a bad thing, so we don't do it. It would be better if we were forced into initialising variables, like what is done in Herb Sutter's cppfront aka cpp2, and other languages like Rust.

I guess that the lack of bounds checking and other safety measures may be a performance thing, and perhaps the use of static analysers maybe used to catch bad usage, but I can't help thinking that is not enough for the C++ critics: they want it to be impossible to do badly. Also the use of static analysers runs the risk of: "I knew that I thought that I had done it, and I thought that I knew that I had done it." :+)

Of course the usage of C++ is not going to go away for a whole lot of reasons, but there is a risk of loss of "market share", IMO.

I also wonder about the amount of time it takes the C++ committees to adopt things. For example, contract programming and reflection have been proposed and kicked around for years, it seems. But Herb has implemented both of these in Cpp2 in fairly quick time, they weren't there last time I looked a few months ago. I know the committees want wording for the standard, and there is competition for what is going to make the next standard, and that there is a shed load of really good stuff in Boost and other libraries.

Anyway that is my rant for now, looking forward to thoughts from others :+)
I don't think the safety discussion really picked up speed until after std::ranges were already done.

But could you give some examples of where the the algorithm functions/std::ranges could have done "bounds checking" but doesn't? It's generally not possible to do bounds checking if all you've got is an iterator. Passing an integer to std::views::take that is larger than the range that it is applied to is already perfectly safe, it just means you will extract fewer elements than you specified.

Isn't one of the goals of the algorithm functions to be highly efficient? It's nice if I can use them and be be relatively certain that they will compile down to something that is not slower than what I could write myself. I don't know if that is the case with std::ranges when using all these range adaptors and stuff but if you want it to be a replacement for the original algorithm functions it would be a poor selling point if the equivalent usage would have been slower. A lot of people would not have used them.

Remember the old saying "don't pay for what you don't use". Even a null check can be controversial and not everyone think exceptions are acceptable which begs the question what to actually do if you do detect an error.

I can't help thinking that is not enough for the C++ critics: they want it to be impossible to do badly.

Don't destroy the language just to satisfy people who don't use it. People will complain whatever you do. Let C++ be C++ and if you need a perfectly "safe" language then use something else. That's how I see it.
Last edited on
The philosophy behind c and C++ is to be run-time efficient. Thus things like bounds checking etc aren't performed as they are in other languages. The basic assumption with c/C++ is that the programmer knows what they are doing so let them get on with it.
Well I may have been a little too eager to post, while the documentation mentions UB, the implementation does have span checking and concepts to provide compiler warnings. And there is a warning note:

https://en.cppreference.com/w/cpp/ranges/view_counted
cppreference wrote:
Notes

views::counted does not check if the range is long enough to provide all count elements: use views::take if that check is necessary.


I altered the example to make the count 10 which puts it out of bounds, and it did produce 2 warnings. But it did produce erroneous output:

cppreferenceOutput wrote:
Output:

1 2 3 4 5 6 7 0 181290128 32764

2 3 4 5 0 4196509 0 1 2 3


This is the thing C++ critics worry about. In my OP I mentioned that we all know about uninitialised variables, so we don't do it: we fix the warnings; do code reviews; use static analysers. The thing is, if some code somehow falls through the cracks.

With ranges::subrange:
https://en.cppreference.com/w/cpp/ranges/subrange/subrange

The descriptions of constructors 2, 3 and 5 mention UB, but it looks like these would produce compiler warnings as well. Why can't they be hard errors? Obviously I know about -Werror in g++ and clang++ :+)

seeplus wrote:
The philosophy behind c and C++ is to be run-time efficient. Thus things like bounds checking etc aren't performed as they are in other languages. The basic assumption with c/C++ is that the programmer knows what they are doing so let them get on with it.


Well that has been the philosophy for the last 30 years, but IMO I think it might be changing to prefer code that is safe by default, and unsafe code is an opt-in like Rust and Cpp2 for example.

Here is Herb Sutter:
https://herbsutter.com/2024/03/11/safety-in-context/

Near the start, he agrees there needs to be improvement and provides evidence and reasoning throughout.
it might be changing to prefer code that is safe by default


It's certainly the case that more compile-time checks are implemented - which is a good thing. The more issues found at compile time the better - even at the cost of slower compile times. However I contend that for c/c++ run-time safety is a programmer issue and not for the compiler. Although I would not disagree with more run-time checks when code is compiled as a 'debug' build, but 'release' build shouldn't have these IMO.
I think warnings and static analysers can only catch simple cases because if you calculate things or pass it through different functions it might be too difficult (or even impossible) to prove.

Since views::counted takes only an iterator there is no way to bounds check that without knowing what the "full range" is.

Some of the UBs that are mentioned for ranges::subrange is about invalid ranged. There is generally no way to check if two iterators is a valid range. You just have to assume.

It also mentions the following UB situations:

3) [...] n is not equal to ranges::distance(i, s) explicitly converted to its type.
5) [...] n is not equal to ranges::distance(ranges::begin(r), ranges::end(r)) explicitly converted to its type.

I don't fully understand the purpose of n. It seems like you don't need to supply it. So maybe the purpose is just for optimization purposes? If you do supply it it needs to be correct. If it would have to verify that it was correct it would probably defeat its purpose.
Last edited on
Peter87 wrote:
Since views::counted takes only an iterator there is no way to bounds check that without knowing what the "full range" is.


views::counted does take a count, see the example on cppreference. IMO the count n is the whole idea of a counted range. I gather that it works by adding n to the iterator and seeing if the result is valid. I mentioned earlier that I altered the example to be out of bounds with a count that was too large, it did produce warnings. That is all fine, but it does produce erroneous output.

Peter87 wrote:
There is generally no way to check if two iterators is a valid range. You just have to assume.


I am not sure what you mean there? If I is some iterator between begin() and end() , and n is the unsigned count, then std::distance(begin, I + n) would have to <= to std::distance(begin, end) , otherwise I + n is invalid. Maybe I am misunderstanding that somehow, I haven't written any code to test that.

Peter87 wrote:
I don't fully understand the purpose of n. It seems like you don't need to supply it. So maybe the purpose is just for optimization purposes? If you do supply it it needs to be correct. If it would have to verify that it was correct it would probably defeat its purpose.


n is an argument for the constructors 3 and 5, it doesn't seem to be defaulted. I think the purpose of ranges::subrange is also to be able to specify a new range somewhere inside an existing range, with a particular size. And it does seem to check if that spec is invalid, and produces a warning.

In the C++23 draft standard that I have, it mentions the precondition [i, s) is a valid range. The C++20 draft standard I have didn't even have a chapter on ranges, maybe I have an early revision.

With the cost of doing checks, It seems to me that these checks are at compile time, done once, and are only simple pointer arithmetic internally (std::distance and the like). Also worth it if it proves invalidity of a range to start with.

Anyway, my whole point about all this is: despite their being safe alternatives; there being bounds checking; and warnings produced; one can turn on -Werror, it is still possible to produce erroneous output. I am saying that this becoming more of a problem these days; more people are wanting this to be disallowed, or at least opt in to unsafe code.

Also, I am not saying C++ should be made 100% bullet proof from a language point of view, in Herb's article I linked earlier, he says that is not worth it.

seeplus wrote:
It's certainly the case that more compile-time checks are implemented - which is a good thing. The more issues found at compile time the better - even at the cost of slower compile times. However I contend that for c/c++ run-time safety is a programmer issue and not for the compiler. Although I would not disagree with more run-time checks when code is compiled as a 'debug' build, but 'release' build shouldn't have these IMO.


I agree.

One can do runtime checks, unit testing against known correct values, and probably a whole list of other things, but if someone forgets one thing ..... Better to not allow the unsafe thing in some circumstances, or at least make it opt in, and to know what is safe and what isn't? Some projects are quite strict on what they allow: No exceptions, because they take too long; do everything in stack memory because it removes the possibility of errors that come from using dynamic memory.

Finally, I get the sense that I am being boring. Hopefully more fun and excitement tomorrow :+D
TheIdeasMan wrote:
views::counted does take a count, see the example on cppreference. IMO the count n is the whole idea of a counted range.

Yeah, I didn't mean it literally only takes an iterator. I just meant it wasn't enough to be able to validate anything.

TheIdeasMan wrote:
I gather that it works by adding n to the iterator and seeing if the result is valid.

How would it see that it was valid?

TheIdeasMan wrote:
If I is some iterator between begin() and end() , and n is the unsigned count, then std::distance(begin, I + n) would have to <= to std::distance(begin, end) , otherwise I + n is invalid.

This assumes the user passed in a valid range to begin with. There is no way to check that.

Also note that some iterators are "single-pass" so you wouldn't be able to actually call std::distance on it without affecting things.

TheIdeasMan wrote:
n is an argument for the constructors 3 and 5, it doesn't seem to be defaulted.

No, but constructors 2 and 4 seems to take the exact same arguments except that n is missing. I notice that the requires clauses are different but I don't quite understand what it means so I could very well be missing something.

TheIdeasMan wrote:
With the cost of doing checks, It seems to me that these checks are at compile time, done once, and are only simple pointer arithmetic internally (std::distance and the like).

I admit that my experience with std::ranges is limited, and I don't understand exactly how it works and what checks that it performs at compile-time, but I doubt it can do all checks at compile time (because some things are not known until runtime) and even if std::distance can be used it's not necessarily cheap. It's cheap for random-access iterators like std::vector<T>::iterator but for something like std::list::iterator it will have to call the increment operator until the first iterator reaches the second iterator which means it's an O(n) operation.
Last edited on
Peter87 wrote:
It's cheap for random-access iterators like std::vector<T>::iterator but for something like std::list::iterator it will have to call the increment operator until the first iterator reaches the second iterator which means it's an O(n) operation.


Ah, I can see that you are right, I was mainly thinking about random access. I can see that could be a problem for other containers which require actual traversal rather than math.

However, it seems able to do the check, cppreference states:

cppreference wrote:
2) Constructs a subrange from an iterator-sentinel pair. Initializes the stored iterator and sentinel with std::move(i) and s respectively. The behavior is undefined if [i, s) is not a valid range.


Emphasis mine.

I am not sure how it works exactly either, maybe some clever TMP?

With cost, maybe it is worth it, especially at compile time, if the iterator/range is invalid to start with? And it does do it because it produces a compiler warning.

But I go back to the main problem again: despite the compiler doing all it's things, the coder doing a whole list of things to prevent problems, it is still possible to have UB and produce bad output. It's like the analogy of the Swiss Cheese for disasters: several non-ideal things happen that are not big problems by themselves, but if they all line up -> disaster.

All this discussion so far has been about compile time UB, in my mind that is a problem because a program could have a const container. Obviously run time is a different ball game.

Well I am being boring again. Will have to write some code to see what I can discover.
as long as the STL uses pointers, its going to be possible to UB it, is my take on all this. A simplified answer, sure, but out of bounds (array or pointer either one, same difference) or pointer fubars are easy to do. We have the tools to not do them (eg at instead of [] access) but it has a big penalty attached to it. I mean, you can also get the raw (not const) char* of a std string and jack it off into C and back, and if that changed its size or screwed it up, its on you for doing it :) --- the limitations required to prevent every possible screw up come at too high a price.
jonnin wrote:
as long as the STL uses pointers, its going to be possible to UB it, is my take on all this. A simplified answer, sure, but out of bounds (array or pointer either one, same difference) or pointer fubars are easy to do. ....


HerbSutter wrote:
The immediate problem “is” that it’s Too Easy By Default™ to write security and safety vulnerabilities in C++ that would have been caught by stricter enforcement of known rules for type, bounds, initialization, and lifetime language safety


This is what I am trying to get at. Despite all the ways to prevent it, the UB is still there, it will be a vulnerability/bug/crash for someone. I hope that Herb's Cpp2 gets up, it will be a massive improvement IMO.

I have another question: What checks happen at runtime? I know concepts are a compile time thing, but other things like std::distance et al is shown in the implementation of some of the constructors and elsewhere. Do they happen at runtime? I am in the middle of writing some code to investigate these things.
TheIdeasMan wrote:
However, it seems able to do the check, cppreference states:

>2) Constructs a subrange from an iterator-sentinel pair. Initializes the stored iterator and sentinel with std::move(i) and s respectively. The behavior is undefined if [i, s) is not a valid range.

Emphasis mine.

The emphasised section does not imply any checking. Rather the opposite. I.e. it assumes [i, s) is a valid range and if that is not the case the behaviour is undefined because it cannot guarantee what will happen in that case.

TheIdeasMan wrote:
With cost, maybe it is worth it, especially at compile time, if the iterator/range is invalid to start with? And it does do it because it produces a compiler warning.

But the compiler warning is from the compiler, right? It's not the subrange code that is generating it.

I guess it's similar to how modern compilers warn about incorrect type specifiers when using printf (assuming the format string is a literal and not a variable). There is no way printf itself could check that (not unless it were reimplemented to use variadic templates).

I think this is about "quality of implementation" (QoI). Compilers might warn about these things if they can but it's not required by the standard.

TheIdeasMan wrote:
All this discussion so far has been about compile time UB, in my mind that is a problem because a program could have a const container.

And by "compile time UB" you mean code that can be proved, at compile-time, to generate UB if it was executed?

If you're talking about evaluating constexpr functions at compile-time then UB in the language (e.g. dereferencing a null pointer) should be detected and rejected. Unfortunately the same isn't guaranteed for UB that comes from the standard library but I expect it will still be able to catch a lot of it (at least if it causes language-level UB internally in the implementation).

jonnin wrote:
as long as the STL uses pointers, its going to be possible to UB it, is my take on all this.

Yeah, and iterators are very similar to pointers. Pointers to array elements fulfil all the requirements for being random-access iterators. That's why iterators are not necessarily any safer than using pointers.
Last edited on
IMO, I think standard data structures and algorithms should have by default more safety checks, and it should be the job of the compiler to optimize redundant checks away. It's not the '80s anymore. Compilers are full-on theorem provers and should be more than capable of handling the task. If you call std::vector::front() and the compiler can't prove that the vector is not empty such that it can't remove the internal check, is it more likely that you simply forgot to add a check (or accidentally removed it during a refactor), or that you know something the compiler doesn't?
Registered users can post here. Sign in or register to post.