'Native' C++ string performance

I am not a fan of string classes, even though I have written several of my own. Especially when programming Windows I feel the hassle of having to expose the char array for integration with API functions and then re-encapsulate any external modifications is more bother than automated allocation/concatenation/deallocation is worth. However, in the project I am currently working on the sheer number of char arrays, array lengths and buffer lengths I have to keep track of is starting to wear me down!

I wonder if any of you chaps have carried out a performance analysis on the string class which comes with the standard C++ library? Is it a huge execution hit when compared to using raw char arrays? Does the convenience outweigh the losses?

I would imagine something distributed with the core libraries must be pretty much as optimized and debugged as it is possible to make it? The MFC implementation of a string was bloated and plodding in my opinion. Likewise, while I understand you can use several of the Hewlett Packard templates to simulate a string data type. However I don't understand how to use templates generally and certainly not the STL! At the end of the day I think the 'std::string' is probably the way to go, but I am still somewhat reluctant to follow in that direction without more information.
performance analysis on the string class which comes with the standard C++ library?
Which library? There is several implementation (Basically one per major compiler) and they all use tricks to increase perfomance.
Is it a huge execution hit when compared to using raw char arrays?
Depends on how you are using them. Usually they ar at least comparable with raw arrays, and could be even faster as they only reallocate memory when they actually need it.
Strings generally have fast move operations so you should not worry about returning string by value.
Does the convenience outweigh the losses?
Think for yourself: strings have error proofing, automatic memory management, standard library integration and load of utility functions. And they are templated, so you will not even compile features, you do not use.

However you should not use string blindly and think that they solve all your problems automatically. Learn about move/copy semantic and try to avoid excess copying of your strings.
Last edited on
I use the MSVC compiler, although feel like I should start using the GNU offering really as - at least intellectually - I fully support the Open Software movement.

You mention there is integration of std::string with other standard library functions. That is very interesting and not something I had noticed before. I will certainly look that feature up.

When I have written my own string classes in the past I have used reference counting to implement copy-on-write and minimize the need for expensive deep copying if the string object is passed as a function parameter or return value. However as I understand it, passing/returning by reference rather than by value eliminates this issue anyway.

You also mention templates... They are a very dark area for me as I do not understand them or the Hewlett Packard STL which is based on them. I really, REALLY should learn how to use templates. However the implementation in MSVC means you have to both declare and define them inside the header file which is such a messy solution it puts me off experimenting and learning every time I try. Again - a good reason to change to the GNU compiler!
I do not understand them or the Hewlett Packard STL which is based on them. I really, REALLY should learn how to use templates.
Whole C++ standard library consist mainly from templates.
std::string? Typedef for std::basic_string<char> which is a template. std::vector? A template too. std::rotate, std::reference_wrapper, std::pair? All templates. Anytime you see something like <int> it is sure sign that you are using template.

the implementation in MSVC means you have to both declare and define them inside the header file
It is language requirement. Templates are not functions, they generate functions/classes with needed parameters when used. Because of that and separate compiling approach C++ uses you should have implementation avaliable in each compilation unit. On the bright side it allows more agressive optimisation in compile-time.

In fact it is possible to have implementation in separate file, but it opens whole new can of worms.
Definitely something to look at then - at some point! Things like vectors and pairs go completely over my head. I am completely self-taught and programme for my own entertainment, so lack certain pieces of knowledge in very specific places that are no doubt very basic to someone who had been to university to study IT and programming. I think part of the reason I have a problem with templates is because I lack this knowledge of the ideas they are being used to implement - like vectors and so on.

I also need to get to grips with COM programming for Windows - but that is an entirely new topic again!!!

So far as std::string goes though... I am pretty much sold on using it in my current application at least.
When I have written my own string classes in the past I have used reference counting to implement copy-on-write and minimize the need for expensive deep copying if the string object is passed as a function parameter or return value. However as I understand it, passing/returning by reference rather than by value eliminates this issue anyway.


It's also worth noting that doing this makes the class completely unsuitable for multithreading environments... so std::string likely does not do this [anymore].

It also might not be faster in all cases...since the extra indirection required for shared ownership on every single string access might be more expensive than the one-time copy of the string data. EDIT: actually you wouldn't need an extra indirection for that now that I think about it. Still... it'd be worth profiling to see if this really matters.


That said... some operations are almost certainly faster with std::string than with C-style strxxx functions. Specifically, anything that requires knowing the length of the string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// assume equivalent setup:
char cstr[100] = "kjsdljalsdjljwljlsjdflsjdlfjsldfjskdlasljdfs";

std::string sstr;
sstr.reserve(100);
sstr = cstr;

//////////////
strlen(cstr);
sstr.size();  // <- almost certainly faster

//////////////
strcat(cstr, "foo");
sstr += "foo";  // almost certainly faster 


The reason for this is because std::string can retain an end-of-string position as one of its members... whereas the C-style functions have to walk through the entire string each time looking for the terminating null. This makes strcat especially painful when used in repeated succession (though a clever optimizer might address that).
Last edited on
An interesting point!

I guess the problem with multi-threading is either thread might try to access the shared string data at the same time?
Right. Each thread might have its own copy of the string thinking it can manipulate each of them independently... but behind the scenes the string class would be sharing a reference counter... so you get race conditions if they both access it at the same time.
As an aside - you mention the race condition Disch. This IS something I have come across despite not studying programming academically. It actually lead to several deaths when a race condition occurred in the software which controlled a medical radiotherapy device back in the early eighties.

It was a Canadian machine called the 'Therac25' and the government commission which investigated the problem published an absolutely fascinating - and truly chilling - paper on their findings. WELL, well worth reading by anyone interesting in programming as it depicts very well how a tiny problem in code can have devastating and tragic consequences in the outside world.
> Each thread might have its own copy of the string thinking it can manipulate each of them independently...
> but behind the scenes the string class would be sharing a reference counter

C++11 makes a conforming reference counted implementation of std::string a virtual impossibility.
Topic archived. No new replies allowed.