Fastest way to import a text file

Hi, I'm writing a program that takes txt files as input and looks for strings in them.
Until now I have used the getline command
[code]
while (getline(lsass, lineFile))
{
string+=lineFile;
}
[code]
by taking one line at a time and adding it in a string but this method has proven to be very slow (text files weigh around 500mb). Do you know a command that allows me to import the text file all at once and insert the content into a string quickly enough to then be able to filter it with the fstream.find command?
I read on the web that the fstream.read command exists to be able to import "blocks" of files but I have not understood how it works ...
Thanks

ps. I am new in the world of C ++ and in general in the world of programming so
I don't know many commands and I am ready to know them
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <fstream>
#include <string>

std::string slurp( std::string const& filename )
{
    using BufIt = std::istreambuf_iterator<char>;
    std::ifstream in( filename );
    return std::string( BufIt( in.rdbuf() ), BufIt() );
}

int main()
{
    auto text = slurp( "slurp.cpp" );
    std::cout << text << '\n';
}

Why are you reading the whole file into a string at all? Unless the search string can span lines, you could search for the data line by line.
First of all thanks for the quick answers :)

dutch i tried your code but the execution speed seems the same as the getline method ...

dhayden I also tried to search for strings directly from the line taken with the getline:
[code]
while (getline(lsass, lineaFile))
{
if (lineaFile.find("RYwTiizs2trQ", 0) != string::npos)
trovato = true;
}
[code]
but even with this method the speed does not improve.

I can't understand how other programs can load text files into memory in a very short time..
How large is your text file?
How long does it take?

I can't understand how other programs can load text files into memory in a very short time..

Name one. If a file is large it takes time to read it from the disk, no matter what you do.
Last edited on
this isnt the best, but it demonstrates read and write. There are faster ways to do it above these. These are still PDQ for small files like yours, though. I get read in 0.2 sec on my cheap laptop. Yes, I said small file. A large file won't fit into your ram, is a good rule of thumb.

writes about a gig to a file and then reads it back with a report on the read time taken.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
do not run in shell, dunno how it acts about writing files.

#include <chrono>
using namespace std;
using namespace chrono;
struct hrtimer
{
 high_resolution_clock::time_point s; 
 duration<double> time_span;
 void start(){s = high_resolution_clock::now();}
 double stop()
 {    
   time_span = duration_cast<duration<double>>( high_resolution_clock::now() - s);
   return time_span.count();
 }
};

int main()
{
   char * cp = new char[1000000000ull];
   ofstream ofs("bff.txt"); //make a huge file to test with.   
   ofs.write(cp,1000000000ull);
   ofs.close();
   
   hrtimer h;
   h.start();
   ifstream ifs("bff.txt");
   ifs.read(cp,1000000000ull);
   ifs.close();
   cout << "read took: "<< h.stop() << endl; 
}


depending on what you are doing, now that you have the big mess in memory, finding what you want in the middle of it may also take some time. It may or may not be right to get it all in one big buffer for your application?

This forum has had the 'how best to deal with files' topic a few times. Some great resources in old threads if you want to take a look.
Last edited on
Here's an example using read. Note that since it needs to determine the file size in binary mode it leaves the Windows newline endings as '\r' '\n'.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
#include <fstream>
#include <string>

int main()
{
    std::ifstream in( "slurp.cpp", in.binary );
    in.seekg(0, in.end);
    std::string text(in.tellg(), 0);
    in.seekg(0);
    in.read(text.data(), text.size());

    std::cout << text;
}

Thank you for answering :)

In the end I used a binary method very similar to the one dutch wrote me which turned out to be very fast, in less than 1 second it manages to import and filter a 500mb text file.
Thanks again
in less than 1 second it manages to import and filter a 500mb text file.
That's suspiciously fast. Does your disk system really have 4GB/s bandwidth? Your memory system?

I think it's more likely that the program didn't run properly. Check for error codes.
If the file's cached (for example if the program was run before), it'll likely bypass the disk.
Hardware is getting to be amazing.
I redid mine. If you run it as posted, its .2 or less seconds to read the file, and apparently cached (from the write?), as noted.
but forced to not be cached, it was just over 0.5 and still subsecond, though more than twice as slow.

And I don't even have a SSD, my next machine will, but the current does not.

that is kind of annoying to test. the only way I know to ensure no cache is to reboot with a hard power cycle.
Last edited on
Topic archived. No new replies allowed.