Learn from others' mistakes

Never use iterators for traversing strings in too tight loops, particularly if you expect them to get quite large. I just found my nice function using iterators took around 5 seconds to convert ~5 MiB from UTF-8 to std::wstring, with optimizations. Prohibitively expensive.
Iterators are fine if incrementing them and dereferencing takes much less time than each loop. Otherwise, try to avoid using them if you're using a structure with random access.
Last edited on
I didn't understand what happened... what does using iterators for traversing strings mean?
std::string str=/*...*/;
for (std::string::iterator i=str.begin(),end=str.end();i!=end;i++){/*...*/}
A bit off topic... is there a way of finding the size of an array from the system (i.e. without keeping a "size" variable)?
1
2
3
4
int *x;
x = new int[10];
delete [] x; //how does the system know how many cells of memory to release?
//it should store this information somewhere... that means it should be accessible! 



Yes, that's quite off-topic, actually.

There's no practical way of knowing the size of a dynamic array, or of a static array created in a different function. The size of a dynamic array is probably stored somewhere, but where exactly will depend on the specific implementation of malloc(). Programmers have been passing array size parameters for years. Are you saying you can't be bothered? If that's the case, then just don't use them, and use vectors, instead, which implicitly pass the parameter for you.
The fastest access may be using a pointer on str.c_str() (if your intent is to not modify the original string).

1
2
for (const char* p = str.c_str(); *p; ++p)
    // process *p 


As for the off-topic question, the implementation doesn't necessarily store the size. The implementors can use whatever technique they want. Apparently it doesn't store whether or not it's an array, since you have to use the delete [] xyz; form.
Ok, let me tell you the story of my question. I was computing a N by N matrix of laurent polynomials (polynomials with the variable x and x^{-1}). That meant I needed two arrays to store the "polynomial" and the "rational" part of my expressions. My array template class has [two pointers+3 ints + the pointer to the object itself]. Times 2 makes on my 32 bit machine 12*4=48 bytes. Now since for my computations N is 1920, that means about 190MB memory just on the allocation part. Something more, since the system actually remembers the sizes of my arrays (or does it? what does it do?), my actual memory usage is even bigger.

Now the degree of the polynomials would be ranging from x^{-10} to x^{10}, which makes about 20 coefficients, which is only 80 bytes. This means only 2/3 of the memory are effectively used by my program. So, I was thinking of rewriting my array template (I must do that carefully since I am using its full functionality with other classes). I was thinking of cutting its size down to one pointer and one-short size counter, and I was wondering can I go lower?
Last edited on
The fastest access may be using a pointer on str.c_str()
No doubt about that. The conversion is an order of magnitude faster since I moved to string-pointer hybrid access.

Something more, since the system actually remembers the sizes of my arrays (or does it? what does it do?), my actual memory usage is even bigger.
Not the system, the allocator provided by the compiler (if at all). The system only keeps track of how much memory each process has allocated.

I was thinking of cutting its size down to one pointer and one-short size counter, and I was wondering can I go lower?
The lowest you could go is a pointer and a byte that stores the size of the array as a divisor of the real size (real size={1|2|4}*n) or as an exponent of two (real size=2^n), depending on your size requirements.
How does the memory allocator provided by the compiler work? When using delete [] x; what is happenning? Any link you would recommend? (my first google attempt was uninformative).
How does the memory allocator provided by the compiler work?
There's no way to answer that question. All the standard says is that malloc() takes a size in bytes and returns a pointer to an array of that size allocated in the heap. Other than that, it's up to the compiler writer/s.

Any link you would recommend? (my first google attempt was uninformative).

Google malloc implementation. You could also take a look at GCC's malloc().

Unless you're specific targeting a specific compiler-platform pair, knowing how malloc() works is quite useless. In fact, I'll actively discourage you from finding out.
another question on the original topic... How is Hammurabi's solution much faster than yours? I dont get it:

1
2
3
4
5
std::string str/*=...*/;
for (std::string::iterator i=str.begin(),end=str.end();i!=end;i++)
{ str[i]='x'; //this calls the computation i*sizeof(char), and perhaps depends on the 
//implementation of the operator[]
/*...*/}


1
2
3
for (const char* p = str.c_str(); *p; ++p)
//this one saves you from the i*sizeof(char) computation. 
// does it provide any other advantage? 

Actually, there's one small problem with his solution: std::strings allow nuls in the middle.

You're not using properly the iterator:
1
2
for (std::string::iterator i=str.begin(),end=str.end();i!=end;i++)
    *i='x';

Iterators are not primitives, they're objects. Each assignment, each comparison, each increment, and each dereference performed on an iterator is a function call. A function call implies quite a few things. Pushing various things onto the stack, jumping to a completely different location, then popping the stack and jumping back. It's one thing if that's the body of your loop, and it's another if more time is spent checking if the loop is supposed to continue than performing the loop itself.
In this example, the overhead is particularly large. All the loop does is assign integers to consecutive chararacters in memory, but in order to do that with iterators, there are three function calls per loop: one to compare, one to dereference, and one to increment. This makes the loop take several times longer.

Compare that to the simple task of incrementing an integer, dereferencing it twice, and comparing an integer. A single level of dereference carries almost no penalty; only the time it takes the processor to load the variable from RAM. Incrementing an integer takes three instructions, and comparing two integers (not counting the jump) takes three instructions. The body itself takes only two instructions.
Is there a way to avoid the stack manipulations that you described by using the "inline" trick I once got as a hint in this forum (on one of my previous posts)?

I have many times had to deal with optimizations like the one you just described. What I would usually do is do by hand the "inline" part by effectively repeating the code I have nicely packaged in functions. Now imagine what a mess that creates when making changes... Does the "inlining" functions do operations with the stack?

Thanks!
You could have just written a macro.
http://www.cplusplus.com/forum/beginner/11457/

The compiler takes inline as a suggestion to inline. It will inline or not depending on the case. Long functions are less likely to be inlined than short functions. The compiler gives some weight to the function call. The weight of the function call is more or less a constant plus the sum of the sizes of the parameters. Passing an object would give the call more weight than passing by reference; there isn't much difference between passing a variable by value or by reference. If the weight of the function itself is larger by some margin >=0 than the function call, then it's unlikely to inline. Loops probably increase the weight of the call while maintaining the weight of the function. Another factor is the code size. A compiler optimizing for size will inline less than a compiler optimizing for speed.
The syntax "delete [] var;" causes the compiler to treat var as a pointer to an array of elements and call the destructor of each element before freeing the memory block. This means that necessarily the compiler must store the number of elements dynamically allocated to the array somewhere. This is implementation specific, though I can tell you that GCC writes the number of elements dynamically allocated in the couple of bytes immediately before the pointer that is actually returned to the user.


helios: Actually, there's one small problem with his solution: std::strings allow nuls in the middle.

True, if you need to deal with nuls as data then you'd do something like below. About the same performance, I'd say. Has a subtraction, but avoids dereferencing p.

1
2
3
4
const char* start = str.data();
size_t len = str.size();
for (const char* p = start; p - start < len; ++p)
   //... 

The overall weakness of this (and the previous) technique is that you can't modify the original string. So it might be best to simply memcpy the data to a new string before the loop and operate directly on it, modifying it in place. If your algorithm allows that, it could potentially speed it up quite a bit since you're only accessing one array in the loop instead of two different arrays, possibly avoiding cache misses.

jsmith: GCC writes the number of elements dynamically allocated in the couple of bytes immediately before the pointer that is actually returned to the user.

That seems to obviate the [] flag to delete. Just put a 1 there if it's a single object. That and the (dynamic) pointer type is all you need, it seems to me. I suppose the [] flag's existence allows the impl to store the total bytes (or a pointer to the end) instead of the number of objects, even for an array.
The overall weakness of this (and the previous) technique is that you can't modify the original string.
char *p=&str[0];
Yup. I didn't believe it the first time I saw it, either.

Yeah, I never understood why there's delete and delete[], yet there's only one free().
Topic archived. No new replies allowed.