Question 4 about optimization

Forum

Forum
Beginners
Question 4 about optimization

Question 4 about optimization

Consider the two code snippet below:

int side ,s ,p;
cout << "Enter square's side value -> " << endl ;cin >> side;
s = side*side ;
p = side*4 ;
cout << "The Area = " << s << endl ;
cout << "The circumference = " << p << endl ;

int side ;
cout << "Enter square's side value -> " << endl ;cin >> side;
cout << "The surface = " << side*side << endl ;
cout << "The periferic = " << 4*side << endl ;

I think the second one is better since I not have to declare two variable, the program will lighter. Am I wrong?

salem c (3685)

> Am I wrong?
Probably.

Where you only calculate, store and use in a single place, most optimisers would eliminate the store.

1
2

g++ -S foo.cpp
cat foo.s

You can examine the generated assembler for any of your snippets, with whatever optimisation flags you care to use.

AbstractionAnon (6954)

This is really a question about style, not performance.

I prefer the first snippet because you can set breakpoints on lines 3 and 4.
With a good optimizing compiler there should be no difference between snippet 1 and 2.

I said this was a question of style, not performance. This is because you're doing cin and cout which execute hundreds of instructions and wait on the user. This is many times more than the arithmetic.

ninja01 (157)

I said this was a question of style, not performance.

, I did not discuss performance, I discuss size.

I think the first program will have 8 bytes more than the second program, but i am not sure

JLBorges (13770)

> I think the first program will have 8 bytes more than the second program, but i am not sure

Check it out (on the implementation of interest).
https://godbolt.org/z/oM9r3WbeP

ninja01 (157)

Thanks @JLBorges, but this link https://godbolt.org/z/oM9r3WbeP is supposed to be for other question I ask, I apologize for asking so many question )))))

Last edited on

Peter87 (11178)

https://godbolt.org/z/brsEPGcvq

There is a slight difference in code size. I haven't looked close enough to see if there is a difference in the stack memory usage or runtime performance.

You might want to try with different compiler settings (e.g. -O3 instead of -O2) or a different compiler (e.g. Clang instead of GCC) but if you run this code I'm convinced you won't be able to tell the difference.

Last edited on

jonnin (11333)

do you know about cpu registers, and the program stack?
I ask because when they said above it can get rid of the variables, what really happens is the smart compiler can see that you only do a couple of things, and that everything here can be done in a register without a memory location or stack frame variable.

if you did 20 variables instead of 3 or 4, it would run out of registers, and then it would see that it needs to use the stack frame.

if you kept the values after the code, instead of a little snip that discards the values after a print, there are extra steps somewhere.

simple types like int, where a variable and a register are interchangeable, don't apply to code where your variable is an object with a dozen variables of its own, though, and creation/destruction may cost you something as well, and so on.

the best way to understand this stuff is to learn enough assembly language to see what steps you would have to code up to do the same thing. Assembly isnt exactly 1 to 1 on cpu instructions, but its much closer to that than c++ where 1 line can trigger pages of cpu instructions. Then you can export the assembly to see what it did, or mentally visualize what would be needed (tricky, compiler gen assembly uses every trick in the book, which takes a pro to visualize). Its best to learn a bit and then generate the asm listing, then take a look for yourself as to what all it has to do with your snippets.

Last edited on

ninja01 (157)

@Peter you just taught me two flags yesterday, so what is this -O2 and O3, In C:B, there is no description

do you know about cpu registers, and the program stack?

I know about cpu register, for a program stack, I am not sure, all I can think of, is the total of variables of a program that reside in stack memory.

if you did 20 variables instead of 3 or 4, it would run out of registers, and then it would see that it needs to use the stack frame.

, but what about the .exe, I do not know how the .exe is built, but i think they will two ints s and p(of the code above) part of that .exe, or do you mean when the program will compile, those variable will not be part of the .exe and the compiler will do what it sees fit ?

Peter87 (11178)

In addition to what Jonnin said I just want to point out that compiler do "data-flow analysis" and looks at how the values flow through the code so the actual variables in the code are often less relevant, at least for primitive types like int.

https://en.wikipedia.org/wiki/Data-flow_analysis

ninja01 (157)

wow, to much to learn

jonnin (11333)

O2 and O3 are optimization levels, default flags. O2 is good optimize, fast compile, O3 may compile a good bit slower on big programs, but will run a little faster. You can also hand-specific each flag instead of these defaults, or at least some/most of them can be, but that is really involved and the defaults are good for 99.9999% of programs.

----
yes, the compiler will do its best to not be slow and stupid.
so if you ask for it to do something simple with a throwaway local integer variable, it will just use a register for that 'variable'. No stack entry, no ram entry, just a register. This is much faster than working with memory, far fewer instructions and no waiting on the data transfer to and from the stack/ram/etc. You have to understand that the CPU works on REGISTERS. If you add 2 integers from ram, it loads them from ram into 2 registers, adds them and the result lands in a register, and then it moves it from the register back to memory. If the compiler can determine that you don't need all that crap, it just uses the registers and avoids the memory part.

but the cpu only has like 10 registers or whatever. If it runs out, it has to swap to and from memory, if it can't use and discard one by one that is. And it knows this, and will do the correct thing for the code it was asked to handle (the compiler is making the code here, and the cpu just does what it was told, which would be to not waste time going to / from memory without reason).

also, for your code, if you print to console it may have to use variables after all, as the call to the routine that prints to console probably requires the values to be pushed onto the stack, even if the computation etc were all in register. See what a pain in the backside this level of detail is? Not only is it a lot of details, the details are coupled with a giant number of 'what ifs'.

Last edited on

Peter87 (11178)

ninja01 wrote:
what is this -O2 and O3, In C:B, there is no description

O stands for optimization.

GCC uses -O0 by default (your IDE might use another default). It means no optimizations.
-O1 enables some simple optimizations.
-O2 enables most optimizations that does not lead to an increase in code size. This is often recommended as a good default for release builds.
-O3 enables even more optimizations but can lead to increase code size which can be bad for performance. It often leads to increased performance in microbenchmarks but when considering the whole program or the whole system (operating system and all programs running at the same time) it's not necessarily a clear winner.

The compilation time is generally longer the higher optimization level you use. If you use a higher number than 3 (say -O5) that will be equivalent to -O3.

There is also -Os which tries to optimize for smaller size and -Og which you can use if you want to have some optimizations enabled without ruining debugging experience (e.g. when using a debugger such as gdb).

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Clang also have these flags, although I think it makes slightly different tradeoffs.

Microsoft's compiler also seems to have these flags but then you need to use / instead of - (e.g. /O2).

Last edited on

seeplus (6457)

Don't try to out-think/out-optimise a modern current C++ compiler. Given the appropriate options for fast code (some also have options for small code eg VS), they are very optimisation efficient.

jonnin (11333)

^^
This stuff mattered, a lot, in 1990. The compilers were dumber, the cpus had 1 core, and maybe not even floating point support, and they ran at 25-33MHZ instead of todays 4000 Mhz (4 ghz ish is current). You had to squeeze every cycle out of the chip. Now the chip is likely to spend 85% of its time taking a nap while it waits on something to come along to do. I used to spend a lot of time in these areas, but around the time the dual core cpu was invented, this crap fell to the wayside for most people, and there was great rejoicing.

I fall into the old school thinking that knowledge is power, and that having at least a surface level understanding here is useful (and an in depth on in a few fields like compiler writers or embedded device devs) but be careful not to get lost in the weeds here.

Last edited on

seeplus (6457)

Yeah - but that was then and this is now as they say. I used to do in-line asm in c code for best optimisation of critical code. But nowadays the compiler will probably do at least a good (or perhaps better) job.

Topic archived. No new replies allowed.