I think you have confused a
binary file with
the textual representation of numbers (a confusion by what is meant by the word "binary").
In computers, as in the universe, numbers don't have any
radix. That is something we humans impose upon a number to make it conveniently usable. Hence the old joke that computer programmers can't tell the difference between Halloween and Christmas: because 31 Oct == 25 Dec, or as C++ people would write it:
031==25
.
Using that number as our continuing example, here it is in various radix representations along with the way it would be written in a C or C++ program
19 hexadecimal 0x19
25 decimal 25
31 octal 031
11001 binary (no representation in C)
The important thing to remember, though, is that
it is the same number no matter how you represent it.
Which leads us to representations. In terms of computer programming, numbers are stored directly:
1 2 3 4
|
int counter;
unsigned age;
float pi;
char digit;
|
But we humans cannot read them directly. We like to look at them with textual representations -- in other words, as
strings. Think about it -- all the "numbers" you've read in this post today were actually just a collection of characters that represent those numbers -- not the numbers themselves.
A binary file, on the other hand, means that it is not textual. Many files are meant to be read by humans -- like HTML pages. But there are files that are
not meant to be read by humans; instead they are more convenient for use by the computer itself -- like
.exe and
.dll files.
Other binary files include compressed data files, like
.zip,
.tar.gz,
.7z, etc.
One of the major differences between storing the number as "binary" (or direct) or "textual" is that the textual representation of a number is
much larger than its binary (or direct) representation. For example, a single byte (
unsigned char
) can store numbers from 0 to 255. To textually represent that number takes at minimum one byte (
'0'
) and at maximum three (
'2', '5', '5'
) -- or if you use a radix of 2 it takes eight. Also, computers are designed to work with the numbers directly (to work with the binary data directly), whereas they must translate or convert textual data into something they can understand before doing anything with it.
Binary File I/O
So, that leads us to what is meant by binary I/O: it means reading and writing data in its binary representation (directly) instead of reading it as strings.
Here is an example to get things going. Give this program a run and examine the two files it produces. They both contain the same "number", but they store that number differently. You may want to use a hex editor to view the second file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
unsigned year = 974;
// Save it as text
ofstream outtxt( "textual.txt" );
outtxt << year << flush;
outtxt.close();
// Save it as binary
ofstream outbin( "binary.bin", ios::binary );
outbin.write( reinterpret_cast <const char*> (&year), sizeof( year ) );
outbin.close();
return 0;
}
|
Oh, whatever -- here's something to look at the files:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
|
#include <fstream>
#include <iomanip>
#include <iostream>
#include <string>
using namespace std;
int main()
{
string filename;
cout << "Please enter the name of the file to open> " << flush;
getline( cin, filename );
ifstream bin( filename.c_str(), ios::binary );
if (!bin)
{
cout << "I could not open \"" << filename << "\".\n";
return 1;
}
unsigned index = 0;
char byte;
while (bin.get( byte ))
{
int b = (unsigned char)byte;
cout << index << ": " << setw( 3 ) << right << b << " == " << byte << endl;
index++;
}
return 0;
}
|
One final note. When performing binary I/O, you must account for endianness when dealing with numbers larger than a byte. The above example did not -- it just dumped the number to file using whatever your computer's default architecture is. (If you are on a PC, it is
little-endian -- meaning that you will see 206, 3, 0, 0 in the output. If you are on a SunSPARC then your output will be 0, 0, 3, 206, since it is
big-endian.)
If you are doing compression/decompression algorithms, you should be reading everything by single bytes anyway, so you shouldn't need to worry about endianness.
Hope this helps.