Well recently i've seen someone make their own custom library meant to replace the std, as a lightweight cross platform alternative. They hated the very idea of OO (object oriented) and thus used a more functional approach. They ended up using namespaces to separate all the functions into basically what you would do if you were programming using OO, just instead all member variables were exposed as he was using private.
So as much as he hates OO he is essentially using it, just in a roundabout way minus all the conveniences that C++ provides for OOP. I just don't understand some people's hate towards OOP, there are some limitations but in C++ you can use either or, For this usage of implementing a standard library, almost nothing benefits from being data oriented as it all has to be general purpose and thus no additional optimizations could be made. All the string functions only interact with itself, so there is no benefit for having a string's data exposed that'd could be used with something else.
They ended up using namespaces to separate all the functions into basically what you would do if you were programming using OO, just instead all member variables were exposed as he was using private.
Can you link it? This seems pretty stupid.
For this usage of implementing a standard library, almost nothing benefits from being data oriented as it all has to be general purpose and thus no additional optimizations could be made.
You're correct. He may either be developing for a platform with a shitty implementation of a standard library or is just obsessed with making everything himself. He also might just want to remove some of the standard library's bloat.
All the string functions only interact with itself, so there is no benefit for having a string's data exposed that'd could be used with something else.
The advantage of data oriented is that all of its internal functions are exposed, so if two "objects/data-sets" need to interact with each other you would need to do declare some friend relationships in order to expose that data, as well as which class would store the function. But for a string object this doesn't really matter, as well for a lot of standard library classes. They all need to be generalized so there isn't really going to be any benefit. He also hides allocation details behind the actual data, so his String implementation is actually like this:
1 2 3 4 5 6 7
struct StringImpl
{
Allocator* allocator;
int size;
int capacity;
char data[];
};
He just passes around a char* to the data variable, and when you do calls to the functions it uses the data before it. So basically he is doing that just to avoid a c_str() like function.
My understanding is that the point of DOP vs. OOP is basically improved cache performance in soft real time applications. For example, suppose you have a list of Persons and each Person consist of a date, a string, and a bitmap. If you want to, say, get a histogram of all ages, the memory layout date-string-bitmap-date-string-bitmap-date-string-bitmap-... has worse cache performance than the layout date-date-date-...-string-string-string-...-bitmap-bitmap-bitmap-... because the latter can take advantage of data locality. The latter layout is not really idiomatic in OOP.
I vaguely remember reading about the stb library in game development circles, the philosophy of which this codebase supposedly follows.
I use what helios describes as DOP all the time, but it doesn't necessarily contradict OOP design guidelines. You can still place all the data in class Persons, and even have a method Person GetFirstPersonThatMatchesName(const std::string& name); or std::vector<Person> GetAllPeopleWithName(const std::string& name);
One place where splitting pairs is naturally useful: sparse polynomials (imagine storing the polynomial 3x^10 + 5x^2000+1). You have two options: keep the monomials and their coefficient as a pair or keep them separate. I tried both approaches, starting first with storing pairs (coefficient, monomial). For me personally, that turned out to be the worse design choice. I did much better when refactored my code to store polynomials as List<coefficient>, List<Monomial>.
Reason: if you store monomials/coefficients separately, you have no zero terms and each polynomial has unique representation. With the split monomial/coefficient approach, a polynomial is an empty collection of coefficients/monomials. If you use the pair approach, you immediately run into problems: the polynomial storing the pair (0, x^5) has to be identified with the polynomial storing the pair (0,x^3). You are having multiple representations of the 0 polynomial - you can select one as the "canonical one" but then you have to clean up zero monomials every time you do something non-trivial with a polynomial. I remember having hundreds of bugs due forgetting to clean up zero monomials, including nasty bugs such as two equivalent polynomials having different hash functions (extremely nasty and difficult to catch bug).
I no longer recall the exact reasons (may have been personal coding style) but the separate monomial/coefficient approach was far less bug-prone.
Oops you are right about that. I will fix my post to reflect it.
Anyways, for some reason which can no longer recall - it may have been my personal coding style - the separate monomial/coefficient approach was less bug-prone.
The solution to ([0], [5]) == ([0], [3]) was to in fact never store monomials with zero coefficients (those get "cleaned up"). For some reason I was failing to do the clean-up more often when storing monomials as pairs.
Yah there isn't really a difference, between OOP and DOP when looking at his implementations which is why i don't get why he'd hate OOP so much more when it's practically the same, just some semantics of f.foo() vs foo(f). In the OOP way you can also hide information in an easier way. Using private/public. Like he exposes "allocator" variable for his array implementation, anyone using that might think oh i can change the allocator without making a function call. But you have to, so that might as well be declared private but then all the array functions would have to be declared friends cause they aren't defined in the class. It's a huge mess.
Use it where appropriate, like for game programming yah ECS (entity component system) is popular and is a data driven approach but that doesn't mean your string or array implementation has to be as well. You can use ECS with OOP for your array and string and it doesn't make that big of a difference. I don't know, i just hate that the author hates OOP but really it just provides conveniences in comparison to the way he is using DOP.
that doesn't mean your string or array implementation has to be as well
I would tentatively agree with this. I can't see a situation where one would need, for example, to access all the capacities of a sequence of strings while maximizing cache performance. It doesn't really make sense to me to use DOP patterns with the basic data structures.
@cire: In C++ you have to go to an extreme and cast pointers with undefined behavior, whereas in Python it is perfectly legal to access those "private" members. It's just a convention - Python seems to be primarily DOP by design.
My understanding is that the point of DOP vs. OOP is basically improved cache performance in soft real time applications. For example, suppose you have a list of Persons and each Person consist of a date, a string, and a bitmap. If you want to, say, get a histogram of all ages, the memory layout date-string-bitmap-date-string-bitmap-date-string-bitmap-... has worse cache performance than the layout date-date-date-...-string-string-string-...-bitmap-bitmap-bitmap-... because the latter can take advantage of data locality. The latter layout is not really idiomatic in OOP.
I vaguely remember reading about the stb library in game development circles, the philosophy of which this codebase supposedly follows.
I'd say you're about spot on with the cache performance deal. I think it has to do with the fact that this manner of storing data allows the CPU to spend less time waiting to receive data from the system's RAM, as nowadays CPU's tend to spend a lot of their time idling on non-cache friendly programs. With the -string-string-string...-bitmap-bitmap-bitmap... style, you can just store each type in a contiguous array and it makes it much easier for the CPU to bullrush through everything.
Now that you've linked it, I believe the main point of the implementation wasn't to replace the string library with a "more efficient" version, but to create a C-compatible string library.