I've implemented huffman's algorithm. As I built it, I tested every little step along the way inspecting binary output etc. Everything so far has worked great and I am able to compress a text file to binary output. Read that in, and decompress it back to a text file that I write out. The two text files look identical when opened side by side in notepad. However, the problem is that the reconstructed text file is slightly bigger than the original one by a few bytes.
I downloaded ExamDiff to compare the two text files, and although the two text files side by side look identical, ExamDiff is showing that the newer text output by Huffman seems to have extra lines. Visually, I cannot see these lines, but I'm guessing something extra is either being added, or perhaps there is some ascii symbol that I am not properly handling in my program? So to summarize, the input looks identical to the output. However the input has 301 lines according to ExamDiff, while the output has 601 lines.
Any help or direction to explore in fixing this is greatly appreciated!
Use a hex editor to compare the two files.
You probably forgot to open one of the files in binary mode and now there's some confusion with newlines (\r\n vs. \n).
Thank you so much!
I followed your suggestion and noticed that there were indeed extra carriage returns 0x0D being added after each one in original input.
Even more helpful was your suggestion about binary modes. I changed the output of the final text file to binary (from std) and used: