I am running Linux C++ on NAND flash file system which using page to write, any write function directly calling flush or sync or fsyn or fdatasync() or O_SYNC during the write process will force to write to the file system and caused file system fragmentation. Otherwise, the OS will do a good job to keep IO in cache and to flush to the file system in late stage.
I am currently using std::ofstream operator<< to write files to the filesystem caused large filesystem fragmentation because the std::ofstream operator<< using some kind of sync or flush.
Appreciate any insights on which C/C++ IO write function does not call flush to let the OS handling the IO.
<< flushes on endl if these are text files.
you can use \n instead and it won't.
but I don't know about preventing flushes 'when it feels like it'.
for smaller files, you can keep the whole thing in memory and write it all at once, and that may work for you. ?? If not then we need to get in deeper.
at the very least, keeping larger chunks in memory (say, in a string) and writing that once in a while will greatly reduce fragments. A string can have multiple end of lines in it, etc.
so instead of file >> data, you say string += data; string += "\n"; ... and from time to time (10s of MB?) write string and set it back to "".
IIRC SSD does not care about being fragmented. Its direct acccess (?) ... the slowness from fragments is related to disk spinning times and disk head movement, none of which apply to ssd.
I'd use memory mapped files for the I/O, but there are caveats. File I/O output will not expand a file automatically, since one can only map a file's extant space.
However, there are functions for each OS which can expand a file without writing anything more than some information in the directory (not actually writing to the file), after which the mapping can happen.
This won't work well for your constraints if you can't map the entire final output size (or an excess of that), because every expansion of the mapping can lead to an implied flush without some machinations to avoid it.
Linux and UNIX (and thus MAC) are better at this than Windows, so ultimately if portability is in mind, there are a few more caveats (like output to FAT filesystems).
I've not studied this for FAT variants on Linux.
I have used Boost's interprocess library for this (they have an I/O library that does it too, with different interfaces), but I did have to resort to implementing my own functions for expanding the file (the library's versions wrote to the file, which I wanted to avoid).
In the bargain, you get much faster I/O. Experimenting with this approach I found that common file stream operations were limited on reasonably modern hardware to about 150 Mbytes/sec, no matter what the underlying device speed was (unless slower). Some of the old C interfaces for writing files hit 600 MBytes/sec. Memory mapped files clocked in at 1.1 GBytes/sec on the same hardware. Similar results on Windows, though Linux performance at calls to flush (explicitly) or for memory exhaustion (working with 10 GByte+ output) was vastly superior to Windows.
@Duthomhas, I even wonder why fragmentation is much concern on a flash drive, but depending on the filesystem (and even then, on most) the directory can become complicated.
I have no sense of the exact requirement of the OP, but I do recognize that if, as an example, one opens several output files at once, writes to them (extending them) at various times, the expansion(s) of those files can end up being fragmented in storage. It is as if the files are interleaved by the I/O being interleaved over time.
I can't say if that's the OP's problem, but then a simple(r) solution would be to store the output in RAM, if it fits, and send that out after the work is done. Memory mapping can be used, and more generically I've found that to have multiple benefits (so I use an "in-house" library to support it, making it trivial), but there still remains the issue of available RAM. Fragmentation could be limited, if the problem exists, but within the limits of available RAM to help mitigate it.
Buffered output works by writing into a memory buffer, and flushing the buffer when it's full (overflow). If you want to defer flushing, increase the buffer size.
If you aren't directly or indirectly requesting the flush yourself then you just need to use a larger buffer in your streambuf. Probably something like:
1 2 3 4
const_expr BufSize = 100000;
unique_ptr<char> buf = newchar[BufSize];
ofstream myFile(...);
myFile.rdbuf().pubsetbuf(buf.get(), BufSize); // MUST be called before any I/O operations