Binary files

Hello,

I'm having some trouble understanding binary files. Any explanation would help.

Thanks.
Files just contain a sequence of bytes.

In the special case where the file has lines of text, the bytes will typically be printable characters terminated by an end of line sequence. In the Microsoft world, this is called a text files, or a flat file in the IBM Mainframe world.

Files with other content are called binary files and the content is context sensitive. For example, a program is held in a file and conforms to some format that the program loader recognises. Or a JPEG file will conform to its standard and so on.

You can use random access on a binary file, but only sequential access on a text file (unless the lines are fixed length).

Opening a file in text mode tells the file handler to process end of line sequences. Opening a file in binary mode bypasses any such preprocessing.
A file is a series of bytes.
Each byte is an 8-bit integer.

Treating files as binary files simply means you are reading/writing those bytes directly, rather than having them abstracted through a textual interface. But in a sense, all files (even text files) are binary files and can be treated as such.

For example, let's say you have the below text file:

1
2
hello
there


If you examined this file in a hex editor (ie: a program which reads and displays files as raw data) you might see this:

 
68 65 6C 6C 6F 0D 0A 74 68 65 72 65


The first 5 bytes (68 65 6C 6C 6F) are the ASCII values for the string "hello"
0D is the value for the '\r' (carriage return)
0A is the value for the '\n' (line feed)
and the last 5 bytes (74 68 65 72 65) are the ASCII values for "there"


Since bytes are nothing but numbers, binary files often store numerical information as bytes, rather than as text. For example, if you save the number "15000" to a text file, it would look something like this:

31 35 30 30 30 - The ASCII values for '1', '5', and '0'

Whereas a binary file (little endian, int = 4 bytes) might store it like this:

 
98 3A 00 00


The reason for this is because 15000 represented in hex is 0x3A98. On little endian machines, the low byte (98) is store first, follow by increasingly more significant bytes.
Topic archived. No new replies allowed.