Reading xander333's XOR Encryption code, I noticed I/O is performed byte by byte.
So I wonder: isn't there a performance penalty for doing it this way, as opposed to using buffers?
I can imagine that as SSD technology gains ground, the simple way will become the better way, because then the seek times would be much lower... to the point where working at byte level doesn't waste time, maybe?
Or am I naive, thinking that xander333's code actually does as executable what it reads as source?
Or am I naive, thinking that xander333's code actually does as executable what it reads as source?
I don't understand that line?
Also, I have another version which reads blocks of data. I haven't tested the speed differences yet but I could do that tomorrow if you'd like. I decided to keep it as simple as possible for the source code section, thus doing it byte by byte. By buffer you mean reading in blocks right? or extracting it with the >> operator?
What Disch said... if the I/O isn't buffered anyway by OS or after optimization.
Please do test for speed, then post the block reading code... and the test code.
Besides OS buffering, the fstream object maintains a buffer too (technically, it holds a pointer to filebuf, which maintains the buffer)
Even though the program inputs individual bytes with ifstream::get() and output individual bytes with ofstream::operator<<, the data is passed between the program and the OS in chunks of 8k, 4k, or whatever your C++ library authors decided is best. You can observe that on Linux with the strace utility.
Compared to disk I/O, function call overhead is usually negligible, but if profiling shows that it is a problem, I would at least use streambuf iterators to avoid the complexity of stream I/O, although block I/O would certainly get rid of even more function calls. And then one could step outside the standard C++ and use memory-mapped file I/O (available in boost).
PS: @xander333, std::ifstream out? Shouldn't that be std::ofstream?
The overhead introduced by reading and writing each character separately is fairly high.
With that approach, I get a speed of about 19 MB/s. With the previous block-based approach (buffer size reduced to 1 MB), I get ~262 MB/s. And with the following modification, which allows the compiler to XOR 16 bytes at a time using SSE2 instead of operating on single bytes, I get 658 MB/s:
1 2 3 4 5 6 7 8 9 10 11
constint keyBufSize=64;
char keyBuf[keyBufSize];
for (int i=0;i<keyBufSize;i++)keyBuf[i]=key[i%key.size()];
[...]
int j=0;
for(;j<blocksize-blocksize%keyBufSize;j+=keyBufSize)
{
for (int k=0;k<keyBufSize;k++)buffer[j+k]^=keyBuf[k];
}
for(;j<blocksize;j++)buffer[j]^=key[j%key.size()];
Edit: it should be mentioned that the buffer-based approach requires choosing a block size that is a multiple of the key length. Either that, or avoid resetting the key index to 0 inside the loop.
the output files were tested in each case to match the output of the original version exactly. Program execution was measured with the bash builtin command "time", average of three tries.
(so I was wrong, in this case, file I/O does not actually dwarf the extra function call times)