Visual Studio - See Optimizations?

Pages: 12
Jun 20, 2021 at 4:07am
Hey, I was coding a bit on visual studio. I wrote the code on there but it was meant to run somewhere else without Visual Studio's optimizations.

On Visual Studio with optimizations, the code runs in about .3 seconds, turning it off makes it run in about 1.4 seconds. That extra second is important in this case, and I'm wondering what optimizations are being done.

Do any of you know how to figure out what Visual Studio is doing behind the scenes? I'm sure some of the optimizations can't be reflected in the code, but it probably has some code optimizations that were important.

Any clue? Not Possible?


EDIT: The only options I'm seeing would be to try and dismantle the executable that the code creates. That seems like a pain and a half.
Last edited on Jun 20, 2021 at 4:15am
Jun 20, 2021 at 4:45am
I don't get it, your post seems contradictory: run elsewhere without optimisations, but you still want optimisations?

Do you mean build somewhere else?

Once you have a binary, optimisations have already been done.
Jun 20, 2021 at 4:50am
Use Compiler Explorer, godbolt.org
Jun 20, 2021 at 5:13am
@TheIdeasMan

I won't be running the executable in the other place, I'll only be running the code. So I want to know what the optimizations are to see what I could integrate into the code I already have.


@Ganado Thanks for that, I completely forgot that existed. This'll be a big help!
Jun 20, 2021 at 9:25am

I won't be running the executable in the other place, I'll only be running the code.


Sorry, that doesn't make sense either. I am sure you well know that: code is compiled; object files are linked; executable binaries are run. What do you mean run the code?

So I want to know what the optimizations are to see what I could integrate into the code I already have.


Not making sense either :+|)

What advantages are you expecting by looking at or manipulating the assembly?
Jun 20, 2021 at 9:30am
Forget the compiled code.

I want to have RAW optimized code that can run quickly without compiler optimizations. I have code that WHEN optimized is fast, I want to alter the code so that when compiled WITHOUT optimizations runs at least somewhat as fast.


I'm not going to manipulate the assembly - but the assembly that godbolt shows will show me the difference between what is generated when my code is and isn't optimized - so I can see if I can make similar changes in the original code.


Does that make sense?
Jun 20, 2021 at 10:31am
zapshe wrote:
Does that make sense?


No.

If you code up an algorithm as O(N3) no compiler is going to magically figure out that a better O(N log N) algorithm exists. You need to attend lectures and learn about algorithms.

Compiler optimisations are often made to take account of your specific platform, processor and operating system. They might reflect the size of the cache, or the ability to multithread, or use vectorisable operations. You simply aren't going to be able to code for that, partly because what may work on one platform won't work on another.

Two things I have found to benefit speed:
(1) If a std:: library routine exists that will do exactly what you want, then use it; those library routines, particularly those in <algorithm>, will have been honed and polished to be fast.
(2) If you have numerical array operations then, where possible, avoid writing explicit loops. Either use a std:: algorithm if one is available, or use std::valarray (especially if you can use the intel compiler). Then the compiler - not you - can do tricks like re-ordering the calculations, or multi-threading, or vectorising.
Last edited on Jun 20, 2021 at 10:49am
Jun 20, 2021 at 10:51am
@zapshe Do you mean that your c++ source code is going to be compiled with different compilers and that the non-Microsoft one used doesn't provide the same low-level code optimisations?

Also, what version of C++/compiler(s) are you using? Changes to the C++ language (such as move semantics and copy elision) if used properly can have a significant effect on run-time performance.

If you want suggestions for your code, it would be helpful if you posted it.
Jun 20, 2021 at 11:17am
If you code up an algorithm as O(N3) no compiler is going to magically figure out that a better O(N log N) algorithm exists. You need to attend lectures and learn about algorithms.

That's specifically why I said "I'm sure some of the optimizations can't be reflected in the code, but it probably has some code optimizations that were important."

One such optimization I found in the assembly on godbolt was a line that did work which would have been discarded anyway. In this way, some optimizations are in fact changing your code.


You need to attend lectures and learn about algorithms.

The algorithm I'm using can't be changed - I'm only changing minor details to gain performance. So far, I've shaved about .5 seconds - still trying to get more performance out of it.

From previous experience, little things you wouldn't think matters can make huge performance gains. Many times that doesn't matter once optimizations are turned on though.

(1) If a std:: library routine exists that will do exactly what you want, then use it; those library routines, particularly those in <algorithm>, will have been honed and polished to be fast.

I've been doing the opposite - getting rid of std:: functions I conveniently used and creating my own where appropriate. There are a lot of functions that are just too slow.

Then the compiler - not you - can do tricks like re-ordering the calculations, or multi-threading, or vectorising.

The whole point is that I'm trying to optimize the code myself for maximum performance when it is run somewhere I don't have access to such optimizations.

Do you mean that your c++ source code is going to be compiled with different compilers and that the non-Microsoft one used doesn't provide the same low-level code optimisations?

Yes, exactly. And I need to run the code in the other environment and I can't simply bring the executable over.

The part of the code I'm trying to optimize is a recursive function doing some path-finding. I was trying to decrease the amount of recursive calls needed. That started to dry out - then I found that replacing some of those std:: functions gave a lot of performance boosts so I've been working on that.

I'm sure many std:: functions may give better performance than coding them myself if optimizations were turned on, but that's just how it is.



EDIT: The weakest link that I'm trying to fix right now is my string use. I have a string that gets edited (I add and erase from it) every recursive call. Removing that improves my speed 2x. Once I have a better alternative for editing this string, the code should be as fast as I need it to be.
Last edited on Jun 20, 2021 at 11:28am
Jun 20, 2021 at 11:26am
So post the code and let use have a look at it so that we can provide advice.
Jun 20, 2021 at 11:31am
Thanks for the offer, but its honestly a mess.

I'll figure it out myself - it shouldn't take me too much longer. I really was just looking for a way to see what optimizations the compiler does, and I got plenty of that from godbolt.
Jun 20, 2021 at 11:44am
There's also another potential issue. The CRT and the STD Library are implementation dependent. There are different versions for different compilers/os's. They do not all have the same performance.

Re editing string. I don't know what you're doing - but if it involves removing from beginning/end then possibly std::string_view?

PS Don't forget about .reserve() for strings! It can save some re-allocation.

Last edited on Jun 20, 2021 at 11:52am
Jun 20, 2021 at 12:03pm
They do not all have the same performance.

I can't measure how long the code takes when running in the other environment, so I have to just hope that isn't an issue.

I do remove from it a lot, but I also add to it. Its job is to basically have a path from one point to another by the time the function ends.

The string doesn't get big, it only holds about 2-3 sentences worth.

Thanks for the recommendations - I'm thinking maybe a linked list would work best. Still considering.


EDIT: Screw me, it wasn't the strings. I had two things on one line and that made it seem like that strings were the slowest thing.

I don't know why I'm doing this at 5am, I'm going to bed!
Last edited on Jun 20, 2021 at 12:19pm
Jun 20, 2021 at 12:42pm
We used to call this 'doing an all nighter' - with caffeinated coffee on tap and the local 24-hour pizza delivery place at the top of speed-dial!

:) :)
Last edited on Jun 20, 2021 at 12:42pm
Jun 20, 2021 at 4:16pm
@zapshe

Please don't tell us that, this:

zapshe wrote:
I really was just looking for a way to see what optimizations the compiler does, and I got plenty of that from godbolt.


is the summary of the whole topic. If so, the whole thing has been a waste of time, IMO.

If not, I feel I could question / argue with everything you have said, but I am not sure if it would be worth it.

Dare I say the whole thing sounds like a massive case of premature optimisation?

How do you know that the MS compiler has optimisations that the target environment compiler doesn't have?

Why wouldn't you just compile release builds on both systems , then compare? I would be surprised if there was a lot of material difference between them, in terms of speed. MS has LTCG, clang has LTO, I don't know if gcc has anything similar - so maybe there is a difference between MS and gcc. But that is not what you are doing, at all.

Very smart folks have been working on tweaking optimisations with compilers for at least 30 years. Equally smart people have been working on the STL to make it as good as it can be. I seriously doubt you could do better while pulling an all nighter on a Saturday. Were you at the pub on Saturday evening then came home with a "bright idea" ?

The whole idea of trying to optimise un-optimised code seems totally crazy, IMO. Un-optimised code is like a car towing a heavy trailer: saying that we will throw these things off the trailer, it will go faster, is obviously disingenuous - it will never be as fast as the car without the trailer.

Last edited on Jun 20, 2021 at 4:17pm
Jun 20, 2021 at 5:13pm
TheIdeasMan wrote:
...a massive case of premature optimisation

Heh, that make me chuckle more than a bit.

It's a "Guy Thing."
Jun 20, 2021 at 7:18pm
did you happen to set it in debug mode when you 'turned off the optimizations'?
there is a bunch of junk injected into debug mode code.
and optimize does a ton of stuff. so debug should be slower than release without optimize which is slower than release with optimize (and at each optimization flag that matters as well).

anyway, build it with what you will use for the released program. Using something else can be handy to develop it or get a second warning opinion or code validation, or even just using an ide you like, but when you start timing it, its time to move to the real tools.

the number of things modern optimizer does is .. amazing. You can't replicate it, not really. You can profile the code and attack the slowest parts, though.
Last edited on Jun 20, 2021 at 7:22pm
Jun 20, 2021 at 9:36pm
jonnin wrote:
did you happen to set it in debug mode when you 'turned off the optimizations'?

I've also noticed the "bitness" of the .exe can be a factor in speed. Running an x86 app on a 64-bit machine can be (marginally) slower than a 64-bit app.
Jun 20, 2021 at 10:06pm
We used to call this 'doing an all nighter' - with caffeinated coffee on tap and the local 24-hour pizza delivery place at the top of speed-dial!

Pizza would have been great, but coffee tends to make me sleepy ;( Sometimes it makes my heart feel like its beating harder but it never makes me feel energized.

is the summary of the whole topic. If so, the whole thing has been a waste of time, IMO.

If not, I feel I could question / argue with everything you have said, but I am not sure if it would be worth it.

I literally asked that originally. I don't know what you were expecting - I just found that my code ran literally 40x faster on release than debug mode. Currently, my code runs in 0.02 seconds on release mode, but I can't get near that speed in the system I'm actually trying to run it on (specs aren't the issue).

Why wouldn't you just compile release builds on both systems , then compare?

Because I can't. That's literally what I've been saying the whole time - I can't do an apples to apples comparison.

Very smart folks have been working on tweaking optimisations with compilers for at least 30 years. Equally smart people have been working on the STL to make it as good as it can be. I seriously doubt you could do better while pulling an all nighter on a Saturday. Were you at the pub on Saturday evening then came home with a "bright idea" ?

You know, I wouldn't take it so seriously. It's just a little project. There are many library implementations that can slow down code when being done hundreds of thousands of times as compared to a making your own version.

Obviously I'm not doing anything better - but many of those STL functions are meant to take advantage of optimization techniques which my code won't be doing.

but when you start timing it, its time to move to the real tools.

I do just set it to debug mode. I'm timing it just for a general idea of whether or not my changes are going in the right direction. I'm pretty sure the time it takes to run on the other system is completely different.

the number of things modern optimizer does is .. amazing. You can't replicate it, not really. You can profile the code and attack the slowest parts, though.

Yea :( Right now I'm trying to find more conditions that can reduce the number of recursive calls needed.
Jun 20, 2021 at 11:18pm
This sounds like an poorly designed assignment. Problems with time limits generally follow a pattern of posing a problem and setting a time limit such that pretty much any implementation faster than some complexity will meet it for a large enough n. This ensures that factors that are difficult to predict like different compiler configurations, specific optimizations, or stuff like CPU cache don't have much of an effect on the final outcome.

You don't just ask the student to do something dumb like sort a million elements in 100 ms and let the compiler (or even the machine load) figure out whether the execution passes or not.
Last edited on Jun 20, 2021 at 11:18pm
Pages: 12