I am reading a large file (39.7 MB) and searching for a substring in each line. If the program finds one, then it stores that line in a vector. The way I am doing it is too slow. Could any one please suggest a faster way of doing it. Below is the code that I am currently working with.
It looks that you are actually searching for multiple substrings (vect1) in each line. Is it intended?
You are also doing multiple memory allocations per line, even if you don’t actually execute “vect2.push_back(line)”. I would move creation of substrings outside of the “while” loop (you are repeating the same operation for each line).
Additionally, I would consider changing container type for vect2. Name suggests that it is a vector, and vectors can be very slow if you try to add elements to a vector which has already a lot of elements. Unfortunately, none of the standard containers is perfect in this situation but list may be considerably faster.
"I would move creation of substrings outside of the “while” loop."
I did it that way as well but the process was still very slow
"list may be considerably faster."
I am new to STL and have learnt about using vectors so far. Would you be able to give me a small working example of doing the same with list. I'd be gratef
Depends on what methods of std::vector you use. However, STL containers are designed to have very similar interface - "push_back", "begin", and "end" exist for std::list as well. So maybe all you need to do is to change declaration of vect2.
I was thinking that it may be faster to turn it around a bit and get the substring from the line and see if it is in vect1 (or a list of substrings generated from it) but I think it would depend on the number of elements in Vect1, how many repeats of the substring there are and so on.
I have done it this way as well but the process is still very slow. Is it because there are two loops. One is WHILE and the other one is for iterating over a vector?
What if I want to retrieve multiple values? Do I need to use multiset? Will it effect the speed again???
I'm not sure I understand the question.
In this scenario, you would pull out the substring from each of the elements in vect1 and store it in str_set. If an attempt to insert identical substring is made, it is ignored. This is fine because you are not trying to match the read in line with any particular element in vect1, just seeing is say "abc" is somewhere in vect1.
The main speed increase of the above is from moving the for loop outside of the while, you go through the for loop once instead of however many lines there are in the file.