Hi,
I coded a problem in python. Later i thought i should write it in cpp to make it faster. It is basically a decision tree learning problem. But to my wonder, cpp code is running slower than python code. Cpp takes 1 hour to produce the same results that python produces in 6 minutes! I am making extensive use of vectors, sets and strings. Can it be due to that? Please note I have coded the exact same logic in both languages! I am not able to conclude why is this happening! Please help!
First of all, have you compiled your C++ program with optimizations turned on? But I would be surprised if that makes all the difference. In C++ you often need to have a better understanding of what you are doing to avoid costly things like making unnecessary copies and such.
If you want to find out what makes your program run slow I would recommend to use a profiler, like callgrind (part of the valgrind suit).
i didnt enable any optimizations but i dont expect the speed of cpp to depend on that when compared with python
cant share the codes right now as it is part of my research project..i can only say that i am doing a decision tree learning and mining some temporal properties from verilog traces...
The thing is, most major compilers + IDEs attach a whole bunch of extra data to executables built in "Debug Mode" for debugging purposes (so you can step through your code as it is happening line by line). This usually results in a larger executable size, and slower speed.
Try building your program in "Release Mode".
_________________________________________
But, as stated above, this also seems like you're just writing bad code.
As kbw suspected, you are passing the containers by value. That means each time you call the function a copy of the arguments has to be made and that can be very slow if the container is large.
Pass by reference to avoid creating unnecessary copies, and use const to protect the function from modifying the arguments.
1 2 3
void get_intersection(const set<int>& a, const set<int>& b, set <int> &res)
void get_union(const set<int>& a, const set<int>& b, set <int> &res)
int finddigs(const string& x)
yes that helped...the execution time is now reduced to around 15 minutes instead of 1 hour....but still the python code runs faster(6 minutes). The program structure is exactly same in both languages. But i dnt know why python runs faster for the same inputs. I am running the C code in cygwin(32 bit) and python code in pycharm IDE.
Please help.
The program structure is exactly same in both languages.
No, it's not.
In Python, a set is generally implemented as some sort of hashtable. In C++, a set is usually implemented as a tree. You're comparing apples to oranges.
you may achieve better results by inlining, and passing the highest level optomization flags to GCC. The difference will be noticable if the functions are called repeatedly.
This might sound terrible, but: try learning how to write C++ correctly. Mabey you're jumping the gun here, and you might want to consider taking some time to study the language and learn how it works, and how to use it, before you make an attempt at using it for research. From what I can tell, just by looking at your code, you don't seem to know what you're doing.
I'm not judging your skill level based on your code, I'm just pointing out what your code says about you.
Also: Python was written in C, so if your code in C++ is slower than your code in Python you are doing somthing wrong, or you havn't optimized enough.
Also: Python was written in C, so if your code in C++ is slower than your code in Python you are doing somthing wrong, or you havn't optimized enough.
If he's running it on some Python VM employing JIT, e.g. pypy, then the language Python was implemented in is unrelated to performance. It could be implemented even in Basic and still could exceed performance of Basic by an order of magnitude.
But as others already stated the performance difference is very likely caused by suboptimal choice of data structures in C++. Actually C++ does not encourage using efficient structures. E.g. standard map/set in C++ have worse complexity and are slower than map/sets in most other languages. You need to use unordered_map (much less known for being not standard for so long).
In the standards for these languages, the semantics of certain operations are undefined, so an implementation can assume that such operations never occur in program code, since the implementation will be correct whatever it does in such cases analogously to don't-care terms in digital logic. This assumption can make various program transformations valid or simplify their proof of correctness giving flexibility to the implementation. It is the responsibility of the programmer to write code that never invokes undefined behaviour, but an implementation is allowed to print diagnostics when it happens.
In otherwords: it doesn't care what you do, just that what you do doesn't defy the language (aka: like using a variable as a function is impossible..., etc...).
That's besides the point, though.
In any case, I think we can all agree that the OP is doing somthing wrong if his C++ code is running so slow when compared with the python code.
As mentioned, in C++, set, and map, are implemented with trees and have logarithmic complexity. You might see a substantial performance increase if you change to unordered_set and unordered_map.
Besides that, you should try to minimize unnecessary copying. And compile with full optimizations; that can make a huge difference sometimes.
It does by bad naming and by not having the fast containers for very long time where every other language had them. map/set/unordered_map/unordered_set are quite bad names, with no information on the underlying structure/algorithms. By using the short name "map" or "set", it tells the programmer this is the default, general implementation that should be used in most cases. While in reality, in 90 out of 100 cases a hash map is better than a tree map, because you don't need ordering. This is completely contradictory to "don't pay for what you don't use" policy.
Also, by the time unordered containers were added, almost all books/courses used map/sets in the examples and hords of programmers got used to them. I can see, it is quite hard for C++ guys to unlearn them.