I have written a c++ code to do some calculations on a huge text file. It took about 10 days to do the calculation on a high speed laptop.
Is there an online c++ servers or mainframes that I can use to do the calculations? Or anything else i can use to speed up the calculations. I have searched but couldn't find.
Did you make sure you were making a release build? Was your code properly multi-threaded? Are you sure you don't have any superfluous code? Are you using the fastest algorithms you could find?
There are *all sorts* of ways you could improve performance. ;)
But in response to your question about online C++ servers, I would be somewhat surprised if there was something outside of supercomputers that you could buy time on that would let you compile and run your C++ code on them. It's still worth looking into, though. :)
Did you make sure you were making a release build? Was your code properly multi-threaded? Are you sure you don't have any superfluous code? Are you using the fastest algorithms you could find?...
I'm using an intel core dual (2.13 ghz). I think I'm using the fastest serial algorithm that I can write...
I did not use a release build, will this have a huge effect on the speed.
My c++ code is very simple. However, my dataset is very huge... The dataset contains names of people and my code counts the number of times each name had occurred...
Yes, this is a custom algorithm... My dataset contains nearly 83,000,000 lines... Each line has a name...
The algorithm do nearly 50,000,000 passes through the dataset. Once for each name.
No I didn't sort the dataset. Does sorting 83,000,000 lines in a text file take a somehow short time in an intel core dual PC with 2.13 ghz?
As Naraku says, a release build will probably make a big difference. Don't just trust your IDE to do it all for you either. Your compiler probably has many optimisation options that you can turn on.
Dig out a decent profiler and let it watch your code run for a while; it will point out to you the true bottlenecks.
Reading from disk is expensive. Really expensive. It takes forever. At the moment, if you're simply reading through the data over and over, you can make some serious time savings there.
The list of ways to optimise C++ code is long; loop unrolling, references over pointers, speed-for-safety exchanges of C++ style for C style, dropping to assembler, lots more.
My dataset contains nearly 83,000,000 lines... Each line has a name...
The algorithm do nearly 50,000,000 passes through the dataset. Once for each name.
Do you read the file for each pass, or do you read it into memory once.
My dataset contains nearly 83,000,000 lines... Each line has a name...
The algorithm do nearly 50,000,000 passes through the dataset. Once for each name
If dataset fits fully in memory, a single pass is enough.
If dataset doesn't fit fully in memory, three passes should be enough.
If you need more, you are doing it wrong.
BTW: Core 2 Duo is not a high-speed laptop. I'd say it is a common-laptop.
BTW2: Using release build probably won't help much because compilers are not intelligent enough to fix broken algorithms.
It has optimisations that could speed up processing your work by 300%.
It was true 10 years ago. Now VS and GCC are nearly as good. A faster compiler still doesn't solve the problem - unoptimal algorithm is unoptimal algorithm, regardless of low-level optimisations.
It was true 10 years ago. Now VS and GCC are nearly as good. A faster compiler still doesn't solve the problem - unoptimal algorithm is unoptimal algorithm, regardless of low-level optimisations.
I wrote a program a few days ago to search for dipoles in a magnetic field map, which is a very complicated problem that needs processor power. You'll be surprised when you know that VS with its /O2 is 200-250% faster than MinGW's -O3, and Intel compiler (composer 2011) with its optimisation (-fast) is 300-350% faster than MinGW.
There's no point in emphasizing optimistations if you don't know where the program is spending its time. If 5% of the time is spend doing processing and 95% of the time is spent doing I/O, compiling with some kind of optimisation or slating the CPU isn't really going to help.
The guy didn't share any code, and that's why the only suggestion we can provide is optimisations. You can't judge that the guy has written bad algorithms just because his program takes long time to process. In my masters thesis, I had to run a program that took 2 days to finish (it involved discritising continuum objects to a 3D grid of a 1 mm^3 sample in nm resolution). Does this make me a bad programmer or a bad algorithm writer? It's a very subjective thing and you have to give advice with what you have. The guy's asked whether he could improve the time of the execution of the program, and the answer is with optimisations. Not with accusing him of being a bad algorithm writer.