I know nothing about VS. It just seemed like madness to me. :-)
it would seem that if you don't include the extension it seemed to simply ignore the file and not open it |
The extension is just part of the filename and is not special. You need to supply the entire text of the filename for it to match the filename text stored in that directory. It's that prosaic. Any specialness of the extension is simply a matter of the programs that work on files with that extension reading and writing them in certain formats. Also you can generally register programs with the OS shell to open a given file type when clicked. E.g., playing an mp4 by clicking on it causes the OS shell to look up the associated program and run that program with the mp4 as its input file.
As for file types in general, they exist on a couple of different levels.
The lowest level is chosen by how you open it, either in "text" or "binary" mode. In binary mode, you see the raw byte stream of the file. In text mode, there may (or may not!) be a level of "translation" of the input into the stream of characters that you actually see, and a level of translation back into the text format that the system uses.
In *nix, there is no difference between text and binary modes, so you always see the raw byte stream (but you should use the correct opening mode for portability). In windows, on output '\n' is written as '\r' followed by '\n', while on input, '\r' followed by '\n' is read as '\n'. This presents a problem reading windows text files on *nix since they seem to contain extraneous '\r' characters before every '\n'. (We write them to a temporary file and sell them back to MS.)
Anyway, there is more to be said -- Apple's old '\r', DOS ctrl-z, weird mainframe formats -- but that's about it for text vs binary mode. It's more annoying than useful.
The second level of file types, which may best be called "file formats", exists on top of the text/binary distinction. The operating system generally knows nothing about these formats. It's up to you to impose them on the file while writing and interpret them correctly while reading, in either the lower-level text or binary mode.
A text file format may be very loose. It could simply be "a sequence of characters". Or a "sequence of tokens/words/numbers". Or a "sequence of lines".
On top of the line-based format we could have another level of structure where each token in the line represents the values for contiguous fields in a struct. Clearly the order of values is important here, and therefore part of the file format. We could describe this overall format as a line-based text file with one struct per line with field order: a, b, c, etc. But do we ignore blank lines or are they an error? Are comments allowed, say after a # character? That's part of the file format, too, since the files that process it need to deal with that stuff.
Anyway, I've already gone on too long, but two final points about file-type information inside the file itself.
Firstly, there's the interesting case on *nix where a text file (with it's "execute" bit set, a technical detail) can be run like a program based purely on information inside the file (not on a program being "registered" in a system database). The OS looks at the first two bytes and if they are #! then the rest of that line (details vary) is used as the command to run in order to process this text file. If the text file contains text in a scripting language and the program name after the #! is it's interpreter then "executing" the text file is the same as executing the interpreter program with the text file as an input file. And since *nix scripting languages use # as the comment character, the first "magic" line is ignored by the interpreter program.
It's also interesting to note that binary files sometimes have a special first few bytes that identify the file type. Here's a list of a few. The hex represents the bytes; the characters are what they look like in ascii (with . meaning non-printable).
bmp: 42 4D BM
ELF: 7F 45 4C 46 .ELF
flac: 66 4C 61 43 fLaC
gif: 47 49 46 38 39 61 GIF89a
mp3: FF FB
mp3 with ID3v2 container:
49 44 33 ID3
MZ executable:
4D 5A MZ
pdf: 25 50 44 46 2d %PDF-
png: 89 50 4E 47 0D 0A 1A 0A .PNG....
UTF-8 encoded Unicode byte order mark:
EF BB BF
wav: 52 49 46 46 ?? ??
?? ?? 57 41 56 45 RIFF....WAVE
zip: 50 4B ?? ?? PK.. |
The png signature is particularly well-designed, with it's initial byte with the high bit set, two different line endings (0D 0A and 0A), and even ctrl-z (1A):
http://www.libpng.org/pub/png/spec/1.2/PNG-Rationale.html#R.PNG-file-signature