Was std::set suppose to model mathematical sets?

I apologize in advance. This reads like a half-rant/half-question trying to understand why C++ was designed a certain way.

std::set seems redundant and behaves a lot like a std::map, except without the "map property". It seems like you can do everything a std::set can do with a std::map such as element lookup, utilize element-uniqueness property, and their APIs are practically identical, so why would anyone use std::set at all?

What's strange is that the (mathematical) set operations that would distinguish a set from a map aren't built-in:
1. Union(A,B): Returns a set of non-duplicate elements that are from both A and B.
2. Intersection(A,B): Returns a set of elements that are in both A and B.
3. Difference(A,B): Implements set subtraction: A-B; returns elements from A that are not in B.
4. IsSubset(A,B): Returns true if B is a proper subset of A.
5. IsSuperset(A,B): Returns true if B is a proper superset of B.
6. Complement(A,B): Implements A'. A is a std::set, B may be another container e.g., vector. Returns all elements from B that aren't in A (where B is typically a superset of A).

Python and C# (HashSet) sets provide you with some of these operations. So what was the rationale for C++ skimping out on these operations?

Did the committee try to generalize this container so that it captures just the basic properties of a mathematical set (associative-ness, unique keys) and expect the user to implement the operations they need?

This seems to be a common theme in C++ design: "define basic tools to allow you implement just the features you want (of a container)" (i.e., std::algorithms provides std::includes so everyone can implement their own version of math sets, leaving out whatever they don't use).

I can see how this allows C++ to be "lightweight" and "portable" but I can also imagine it's frustrating for those who want to use C++ as a general-purpose language but can't rely on things like containers to have intuitive operations (and types ... I'm looking at you, std::string).

[1] SO: C++ set and mathematical set comparison
https://stackoverflow.com/a/25250264/5972766
Last edited on
these methods were made generic in c++ so they will work with other containers in case the set isn't your best choice. I know that seems weird, but a C array or C++ vector or other containers can sometimes be what you have/wanted and yet you need that union etc.
see std::set_union or std::set_intersection for those.
subset can be done via std::includes, yes the name is weird, but its what you seek.

in short I believe everything you need is floating around, but perhaps not where you expected it to be, for a variety of reasons from sheer age of the language and tools to generic useage.

Its really the other way around. The set is a generic container, not constrained to the math concept of a 'set', similar to how a c++ vector is very unlike a math vector. The choice of using the math names is unfortunate, IMHO, because they don't match up. I would have used another name and let people understand that a specific use of the generic tools is the math use, but there are other uses too. Specifically, I use sets simply to deduplicate data; they excel at that.

string is unfortunate; the need for 3-4 string objects to do "everything" is a frustration. They do well, but you frequently need a view, stream, or character array for various purposes. The c++ naming conventions are frequently a complete failure: map isn't a hash map exactly, set isnt a math set, vector isnt a math vector, cmath isnt C code, and there are more esp when trying to do mathy things.

c# is the grandchild of c++. You would expect the evolution of language to improve a bit across 2 generations. Python is its own thing, and I will simply state that a simple CRC I did was *TEN TIMES* slower in py than c++. I havent touched it since; it can't even do basic integer math efficiently. I even had 4 expert py coders look to see if it could be better, and got a no can do.

If it really bothers you, you can reassemble the existing stuff into a new object in about 10 min that has the names you want, or as operator overloads.
Last edited on
As jonnin mentions, set_intersection, set_union, set_difference & includes provide all the functionality you mentioned.

These operations are separate from the container to support generic programming.
A user can compute the set intersection of any data, as long as their data meets the requirements imposed by the algorithm. In turn the user benefits from the library's implementation.

Take a look at Stepanov & McJones's Elements of Programming, which discusses in general, the principles Stepanov used to design the STL's containers, iterators, and algorithms. There is a PDF here:
http://elementsofprogramming.com/

Last edited on
jonnin wrote:
I know that seems weird, but a C array or C++ vector or other containers can sometimes be what you have/wanted and yet you need that union etc. see std::set_union or std::set_intersection for those.

In short I believe everything you need is floating around, but perhaps not where you expected it to be, for a variety of reasons from sheer age of the language and tools to generic usage.

If it really bothers you, you can reassemble the existing stuff into a new object in about 10 min that has the names you want, or as operator overloads.

Thanks for the context, jonnin. I can see why it'd be reasonable to extract these operations so they can be used with other STL containers or user-defined classes that contain data. It gives users more flexibility when designing these classes.

I guess the price we have to pay is that there isn't an "obvious class" to use to do something.
mbozzi wrote:
These operations are separate from the container to support generic programming ... as long as their data meets the requirements imposed by the algorithm.
Take a look at Stepanov & McJones's Elements of Programming

Mm. I see thanks for the reference.
Last edited on
ElusiveTau wrote:
std::set seems redundant and behaves a lot like a std::map, except without the "map property".

It's true. You could just use a std::map, but if you don't need a "value" for each "key" then it's often more convenient to use a std::set.

Despite the similarities I often find the usage to be a bit different (at least conceptually).

I use a std::map when I want to look up an object by key.

I use a std::set when I want to keep track of if a value is in a "set" (similar to how you would ask if an element is in a mathematical set or not).


ElusiveTau wrote:
Was std::set suppose to model mathematical sets?

I don't think it was invented to be used for mathematical programming. It was invented because it's a useful container and the name "set" makes sense because it has many of the same properties as a mathematical set.


ElusiveTau wrote:
What's strange is that the (mathematical) set operations that would distinguish a set from a map aren't built-in

Those are the things you can do with a set, but what actually is a set?

Reading the first paragraph on Wikipedia that describes what a set is...
https://en.wikipedia.org/wiki/Set_(mathematics)

Wikipedia wrote:
A set is the mathematical model for a collection of different things;

Okay...

Wikipedia wrote:
a set contains elements or members, which can be mathematical objects of any kind: numbers, symbols, points in space, lines, other geometrical shapes, variables, or even other sets.

You can store any type of object in std::set (as long as it can be compared using < or if you provide a custom comparator).

Wikipedia wrote:
The set with no element is the empty set;

A std::set can be empty and it's even got an empty() member function that will return true if this is the case.

Wikipedia wrote:
a set with a single element is a singleton.

Okay, a std::set containing 5 is not the same as 5.

Wikipedia wrote:
A set may have a finite number of elements or be an infinite set.

std::set can only have a finite number of elements for obvious reasons.

Wikipedia wrote:
Two sets are equal if they have precisely the same elements.

This is true for std::set.


jonnin wrote:
see std::set_union or std::set_intersection for those.
subset can be done via std::includes, yes the name is weird, but its what you seek.

Note that these functions work only on sorted ranges. This means they will indeed work for std::set but not for std::unordered_set.

The name "includes" is not that weird because it returns true if the elements of the second set is included in the first set.

Wikipedia wrote:
If every element of set A is also in B, then A is described as being a subset of B, or contained in B, written A ⊆ B, or B ⊇ A. The latter notation may be read B contains A, B includes A, or B is a superset of A.
https://en.wikipedia.org/wiki/Set_(mathematics)#Subsets (Bolding by me)

Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <set>
#include <algorithm>
#include <iostream>

int main()
{
	std::set<int> set1 = {1, 2, 3, 4, 5};
	std::set<int> set2 = {2, 4};
	
	bool set1_includes_set2 = std::includes(set1.begin(), set1.end(), set2.begin(), set2.end());
	
	std::cout << std::boolalpha;
	std::cout << set1_includes_set2 << "\n";
}
Output:
true

The thing that surprises me is that it hasn't got "set" in its name like the other "set operations" in the <algorithm> header.
https://en.cppreference.com/w/cpp/header/algorithm#Set_operations_.28on_sorted_ranges.29_2


jonnin wrote:
The c++ naming conventions are frequently a complete failure: map isn't a hash map exactly, set isnt a math set, vector isnt a math vector, cmath isnt C code, and there are more esp when trying to do mathy things.

Personally I'm not bothered by the name "set".

Why would you associate the word "map" with a hash map/hash table specifically?
Both std::unordered_map (implemented as a hash table) and std::map (implemented as a binary search tree) map values to other values/objects.
Similarly, Java (another programming language) has both HashMap and TreeMap (which both implement the Map interface).
Note that the word "map" is also used in mathematics: https://en.wikipedia.org/wiki/Map_(mathematics)

I agree that "vector" is unfortunate, but mainly because it's relatively common to want to use real mathematical vectors in programming. Not everyone use them but the people that do often use them a lot.

The c in "cmath" means that it's based on the C header named "math.h". The same naming convention is used for all headers that come from C. This decision was done a long time ago. For many of these headers there exist C++ alternatives that are often recommended to be used instead but that's not the case for cmath.


jonnin wrote:
c# is the grandchild of c++.

Is it? Ignoring the name, what makes C# more related to C++ than for example Java? (you don't need to answer)
Last edited on
It depends on your perspective, I suppose.

in short to the above, nothing you said do I really disagree with, BUT a math educated person new to computing (this is typical: a student has usually had at least 2 years of highschool math before starting to code (say around age 15-16?), though of course there are the exceptions who start in elementary school (below age 12) and so on. So by the time the average programmer gets started, they have ideas what a "set" is, what a vector is (maybe, but I knew about these at age 14 or so from simple general science "a few days of physics" and early math where they visually add arrows or the like to explain it simply) and so on. I would have used another word to ensure NO confusion with math concepts, just personally, if I were bruce almighty or something. Collection, some word like that for the 'not a math set' or 'resizeable array' etc for the 'not a math vector' and so on. Map .. the concept of mapping in math exists but its not a 'thing' you operate on, but an acknowledgment that this space maps to that space via a transform or whatever. I don't like "map" because I want to know what its doing to me without having to check a reference, not because of math concepts, its an outlier.

Java nailed it here (you probably won't see me saying that again!). Hashmap and treemap work great for me. Map and unordered map are, IMHO, crappy names. What a 'map' without more description? A way to find the walmart? A transform between spaces? A lookup table? something else?

Why do I associate map with hash? Not sure, honestly. Probably because I expect a lookup table to be instant, not fiddly. And to me, map means lookup (that could just be a flaw in my mentality, though. Plenty of those to pick from).

I understand the C in Cmath but it implies that its on par with more or less obsolete/undesirable things. Almost all the other C headers are things you would pontificate about not using a whole lot of in modern code, c-strings, c-memory(memcpy and all the kablooies that can cause), cstdlib (oh-no, not that again), and so on. Cmath though, is kinda useful even in modern c++ code. The C headers by and large feel dirty, like reverting to C++98.

Going deeper> L0osely, a matrix can be a 2d array of strings, too -- a businessman might say as much about his store's locations vs diversity 'matrix' but if the c++ guys added a generic 2d vector and named it matrix without a math matrix multiply, I would be ready to slap them silly over it. It may be correct, but its also hair pulling.
Last edited on
Topic archived. No new replies allowed.