I am new to this so please go easy on me.
I have to merge sorted files in a directory.The files contains two strings per line and I have to merge on the first string.
At present I am able to open the directory and read the files one by one.Can anybody please help me how to do this?
Open all the files and read the first line, use the first string as a key for the priority queue. You will need to store the
output line and file handle as the data so maybe group these in a struct or something.
Take the top item from the queue and output it, read another record from this file and put it in the queue, repeat until no more data.
This may be overkill in that you probably don't have that many files and the process will be IO bound anyway.
So another option is to just use an array of the top items from each file. Repeatedly linear search the array to find the top string,
output this line and replace that array record with the next item in that file. This may be just as fast as the priority queue in this case.
The files will end at different times so you will have to mark the array records of files that have run out.
test1.txt
apple pear
blackberry strawberry
cherry banana
test2.txt
aardvark pangolin
bear panda
coyote wolf
output
aardvark pangolin
apple pear
bear panda
blackberry strawberry
cherry banana
coyote wolf
The main advantage of file merging is when you files are huge (gigabytes) you can still merge them.
If you files are not so big then you can read them into RAM and merge there.