I have this code which performs the analysis part of discrete wavelet transform. It works pretty well. However, I wish to reduce the time that it consumes even further. I did use reserve() and it helped upto few msec. Any further suggestions would be appreciated.
Using a profiler would probably help the most. You could also store items by reference or pointer instead of by copy (though if by reference you'd need a reference wrapper).
As I alluded to earlier temp_row doesn't seem to need to be a vector, a single floating point number should be all that is needed for that variable. However I have no idea what you're doing with the vector oup. This is probably a bigger bottle neck than temp_row, but without seeing the code for branch_lb_dn() I'm only guessing.