A single thread will use a single core. It may switch between cores depending on the OS, but it won't use more than one at a time.
If you want to utilize more than one core, you'll want to research parallel computing or functional programming. The idea is to use multiple threads to compute independent data simultaneously.
If you identify a bottleneck in your programming which can be split into several parallel parts, you can thread the parts and have them work simultaneously to reduce the time spent computing.
Protip: if you want to have lots of fun, you could use GPUs instead of CPUs to accelerate calculations. GPUs (graphics cards) contain thousands of tiny processors. None are very powerful on their own, but if you can find 1000s of simultaneous calculations (such as multiplying each element in an array by 3) then you can really accelerate your application by using a GPU.
First things first: did you profile your code to find the bottlenecks? Is there a better algorithm that you can use? A modern CPU is really REALLY fast.
So remember, if you have something blocking while other things could be done in the mean time (without interfering with resources from another thread), it "can" speed things up. Used improperly can slow things down.
thanks for answeres, the normal threading with <thread> isn't counted as using the cores right? i will take a tool at TBB! thanks :)
Stew: about your protip: can you give me a good link to how to do that !? ;D it sounds fun xD
dhayden: how can i profile my code to do that?! what exactly you mean by bottleneck, ram? because the code i had used ram much (i wrote it dirty :S ) so it slowed the process down ! but i only want to learn it for knowledge now because i need it :P
Thread will use multiple cores, the OS handles it. It's just a newer standard, whereas posix threads, TBB and so on have been around for years. The C++ 11 threading is a lot simpler to integrate.
Give this a try, make 2 functions, do while(1); in each and run them. Then try with just one. Observe your cpu usage. Make sure to join the threads at the end of the main execution context, so it won't end while the child threads are in-use.
By the way, it won't help your ram/memory usage out, in-fact, if you run out of memory and it's doing the usual cleaning up processes, it'll likely slow it right down to a crawl.
On my quad core laptop, these are the combined core usage results. It's got hyper threading on, hence the spread values.
1 2 3 4 5 6
1) 13% (half a core)
2) 25% (1 full core)
3) 38% (1 and a half cores)
4) 50% (2 full cores)
5) 63%
... and so on
GCC compiler has option(s) to add counters to the code. Each function call and its duration gets counted, when you run the program normally, and the statistics is stored into a file. Then an another tool will show the stats in various ways.
Some (other) compiler might have even an option that if the statistics file is available on recompilation, the compiler might attempt some higher level optimizations.
There is also (auto)vectorization. The processors have MMX/SSE/AVX instructions that can do same operation on multiple values simultaneously. This requires that input data is arranged so that consecutive values can be loaded into special registers that those instructions use. Essentially, there is parallelism within single CPU core that compiler might be able to enable in your program.
A whole another level is MPI-based parallelization; separate processes on separate computers communicate in order to perform single computation.
Then you go all the way; vectorized instructions in each thread, program spread over hundreds of computers, and part of the code making use of the GPGPU too. That is furthest from "simple".
thanks all :)
Krisando , that looks awesome, now that's enough for most of my tasks, but if i wanted a single for to be done by 4cores i will use TCC thanks a lot all :)
btw Stewbond, that link is forbidden for me;( and + my gpu is amd if that makes anydiffrence (worst decision i made :( )