Hi, I'm new to the forum, but I've been referencing this websites forum for help with my program. I have a large data file with 2 million lines, containing a read/write operation & a hex address formatted like this: r abcdef123456 . For improved performance purposes, I tried to read blocks of the file into a buffer, and read and process each line from the buffer. However, I haven't had much luck since the buffer is usually a character pointer, and would like to parse each line to store 2 variable values at the same time. My first approach was initially using fread + fscan, but was concerned that processing the data while reading in the file would slow down performance. My second is with ifstream (posted below), but getline seems to not enter the while-loop. Here's the code I have (sorry if the formatting is incorrect, I don't know how to use the code brackets option in the forum):
Is there a way to read from a buffer, lines of text that I can then parse into 2 variables simultaneously without jeopardizing performance time? Sorry if this is so long, but I've been looking for a solution for weeks now.
Please edit your post and make sure your code is [code]between code tags[/code] so that it has line numbers and syntax highlighting, as well as proper indentation.
I'm not sure why you think using a buffer will improve performance - the bottleneck will always be the speed of the user's hard drive. The only time the buffer would be useful is if you read the entire file into memory, which isn't a good idea for large files.
Is there an alternative method to reading in a large file with 2 million lines in seconds vs. minutes without the buffer technique? Whenever I searched for suggestions to resolving this issue, fread for block reading or ifstream to buffered memory were frequently mentioned.
The file has plaintext in human-readable format, so you should treat it as such. You can't really buffer it. An alternative method to buffering is: not buffering.
With all due respect, your reply is not really offering any suggestions to resolving this issue. Just flat statements regarding what not to do versus offering an alternative programming approach. I need to read a file with millions of lines quickly for processing (i.e. in seconds). So if buffering is not an advisable method, then what other technique should I implement to handle this massive file? If your response is simply to reiterate avoiding buffers, and nothing else, please do not reply. I'm reaching out for ideas on how to tackle this issue effectively, not to be lambasted for employing a method one personally feels is inefficient.
There is no way to read a file faster than the hard drive is capable of. You could switch to a solid state drive, but I don't think this is the answer you want.
Once the data has been captured, how does one retrieve the contents from processing? The processing benchmark time is under 2 mins. (so processing 2 million operations on 2 million addresses in under 2 mins.) Ignoring the machine that's being used to execute this program, what is the fastest way to read in a large file?
To read formatted data, use formatted input.
(Avoid std::getline() => parse using string stream; that would be what is taking a lot of time)
std::clock() measures processor time; to time i/o bound operations use wall clock time
Flush the system cache buffer before taking actual measurements (the contents of the file may already be in memory). Bomb proof: reboot, wait for the system to reach a quiescent state, measure.
@JLBorges: That actually worked out perfectly!!! I was so afraid that just doing the ifstream without a '.read' function was going to slow down my programs performance. But I tested it on the largest file with the 2 million lines and it read and captured both variables in 15 secs!!! Thank you so much for your help, I REALLY appreciate it. For some reason, the std::clock() worked, but I don't know why chrono isn't being detected. I'm using Eclipse CDT 3.8 (which I'm new to using vs. VS2013) to program in C++ 11. So, now I just need to resolve that and I can initialize start/end along with startp/endp. :-D
It says "Symbol 'chrono' could not be resolved" for std::chrono. I added the #include files that you posted, but that didn't seem to resolve it. I checked through the 'std' options to see if 'chrono' was one of the options, but I couldn't find it.