The most efficient way to find a string in a text file

Jan 15, 2016 at 12:22pm
Hi All

I just create a little program for myself to do some tedious work, finding a particular string in a text file.

The program is done, basically it reads the text file line by line, and after reading each line, I have a if statement to check the string against the line. However, it finds it quite slow. If I open the text file in notepad, and using 'Find', the done can be done in a second for a file, but it could take up to minute to be done by my program.

So I would like to discover any alternative and find the most efficient way to do this job please.
Jan 15, 2016 at 12:27pm
Can you show us the code ?
Jan 15, 2016 at 1:46pm
I don have the source code at the moment, so I can't copy and paste, but the core code is:

bool Check_F310_Pass(String line)
{
string string_to_find = "F3.10 Fail";
bool exist = line.find(string_to_find) != std::string::npos;
return exist;
}


string textline;
int F310pass = 0, F310fail = 0, F313pass = 0, F313fail = 0, F316pass = 0, F316fail = 0;

if (inFile.is_open())
{
while (getline(inFIle, textline)
{
if (Check_F310_Pass(textline)
F310pass++;
if (Check_F310_Fail(textline)
F310fail++;
if (Check_F313_Pass(textline)
F313pass++;
if (Check_F313_Fail(textline)
F313fail++;
if (Check_F316_Pass(textline)
F316pass++;
if (Check_F316_Fail(textline)
F316fail++;
}
}
Jan 15, 2016 at 5:47pm
Like Notepad you could try to load the file in one go and do the search later.
Jan 18, 2016 at 9:17am
How to load it in one go please??
Jan 18, 2016 at 10:05am
How to load it in one go please??


1. Get the file size
2. Allocate a buffer
3. Read the file into the buffer
4. Process the buffer
5. Delete the buffer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <iostream> 
#include <fstream>
#include <string>
#include <Windows.h>

using namespace std;

DWORD FileSize(LPCSTR filename);

int main ()
{
  char szFilename[] = "C:\\Temp\\Lorem.txt";
  DWORD len  = FileSize(szFilename);
  if (len == 0)
  {
    cerr << "File is empty.\n\n";
    exit(EXIT_FAILURE);
  }
  char *buffer = new char[len + 1];
  ifstream src(szFilename, ios::binary);
  src.read(buffer, len);
  buffer[len] = 0;
  // do sth. buffer
  //cout << buffer;
  delete [] buffer;

  system("pause");
}

DWORD FileSize(LPCSTR filename)
{
  WIN32_FIND_DATA fd;

  if (FindFirstFile(filename, &fd) == INVALID_HANDLE_VALUE)
    return 0;

  return (fd.nFileSizeHigh * MAXDWORD) + fd.nFileSizeLow;
}
Jan 18, 2016 at 10:37am
however, when I have to do my string search, those Check_xyz, I still have to go through it line by line by get line, don't I? does it mean that I am doing this buffer as an extra work??
Jan 18, 2016 at 10:41am
The standard tool used for this is grep. It's very fast. If you knew how grep did it, you could take some ideas from there.

http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
Jan 18, 2016 at 11:48am
Another option is to use Regular expressions.
http://www.cplusplus.com/reference/regex/
Jan 18, 2016 at 12:12pm
This may have more to do with file buffering than with search algorithms. Try changing the code to simply read the file and see how fast it is. If this is too slow then you need a larger buffer.

I have to run, but I think there's a way to give the streambuf more space. Off hand I suggest 64k.

If you can't do it with a streambuf, then try switching to C style I/O. I know that you can specify the buffer that way.
Topic archived. No new replies allowed.