Okay so I've written a program that accurately splits a file into however many bytes (1 character is 1 byte) the user wants. But here's the thing, if you say 4096 bytes (4 kb) though all the files will be less than 4096 characters, the few that are extremely close (approx >4071) on the hard drive are still seen as greater than 4kb (4.02 is the greatest I've seen), and actually take up 8 kb on disk space. I really don't want to have to say "oh yeah, don't put the exact bytes in, because you'll be screwed in a few instances"
On the link you'll see the data log written in plain english (i hope) of what happens in parsing the data. All of the total characters are below 4096 characters, but in iteration 46 and 59 it's still seen as greater than 4 kb.
First of all why does this happen, and second of all is there any direct conversion (say subtracting 10 characters) that will specifically guarantee that this doesn't happen.
I'm using Windows Vista, similar effect on 7 and xp from my tests. Thank you for any help you can give.
tl;dr: File is less than 4096 characters (less than 4kb) still seen as greater than 4kb (4.01 or 4.02) on hard drive. Why?
I believe for all kinds of file systems, besides the contents of the file itself, they append some additional info aka book-keeping information for maybe easy file retrieval later on by the file-system? This additional info is what added to the total file size seen on disk.
Please correct me if I'm wrong :P
Edit: This also explain if a file is saved under FAT file system, if you attempt to open the file in some other file system, the operation may fail.
I believe for Windows there are API for you to determine the file actual size and other information. For Linux/Unix we can use stat or fstat functions to get the information.