So I want to count the occurrences of every word in a text file. It scans the file and for each word passes it to my insert function. Here is the code for my insert function:
It works....for small text files. I want to use this for a large text file (750,000 words) but it doesn't complete in a reasonable time. Is there a way I can improve this? Thanks.
Your function takes a map by value (forcing a copy of every single string unless you used std::move) and returns it by value, forcing another copy (and RVO doesn't apply because a function argument cannot be RVO'd.
Pass by reference.
Also, the body of your function is full of redundancies: it's exactly equivalent to simply this:
You should pass a reference of the map to the function instead. When you pass a map to parameter one of 'insert', it creates a duplicate of the map for the function to use. This isn't a problem with small amounts, but when you get towards 750,000 entries... You can see how creating a duplicate of a container all 750,000 times you call the function could drastically reduce performance
unordered_map<string, int> theOriginalMap; //the already declared map
string word; //filled in with the next word from the text
insert(theOriginalMap, word);
By passing it a reference, the function edits the already existing map instead of creating a new one.
Try that and see if it improves your performance. (i'm almost certain it will.)
EDIT:
@vlad from moscow
unordered_maps are faster than maps to access members by their key, which is what is needed to be done here.
for a large number of small strings, unordered_map is indeed the better choice. E.g. on my computer, reading a file with 250k words into a unordered_map<string, int> using that function (in the variant I posted) takes 0.15 seconds, while reading into a map<string, int> takes 0.22 seconds.
(but of course, there are specialized data structures for strings that should perform better, if insertion remains a bottleneck)
How can I return a map using pass by reference? I need to return a map that has put the values in it. When I try to use a return function it gives an error.
That's the beauty of a reference, you don't have to.
When you pass the map by reference, you're actually altering what i called 'theOriginalMap' in the function 'insert'.
Try it out.
EDIT:
@Cubbi
Off-topic, but I wonder what affect having a solid-state drive would have on that measure. *proceeds to test it out.*
what's the quickest way to put 250k words in a text file? :)
@Thumper Thanks a lot! We learned about different ways to pass parameters but we haven't really applied to so I guess it caught me off guard. I got it working.