PDFs, MS Word docs, and libre office docs are all three different formats, with different specifications, so you're going to have to handle them all separately.
.docx documents, for instance, are thinly veiled zip files, see:
https://en.wikipedia.org/wiki/Office_Open_XML
They may contain multiple files, folders, with multimedia. I have not used it, but libarchive apparently can handle the compression/decompression of these zip file. I am sure there are other libraries that can do the equivalent:
http://www.libarchive.org/
Instead of directly reading the files yourself, I would suggest finding a library that can already do it, as I'm sure some exist. But nevertheless, if you want to know how a particular file format is structured, you have to look up its specification.
For zip files, the header information can be see here:
https://en.wikipedia.org/wiki/Zip_(file_format)#Local_file_header
In a PDF document, the header is
%PDF-1.7
(or 1.6, etc.) and simply defines that the document is using the PDF 1.7 format.
https://lotabout.me/orgwiki/pdf.html
It looks like LibreOffice document are also zipped, similar to MS Word documents.
https://help.libreoffice.org/Common/XML_File_Formats
For generally reading binary files, you need to make sure you have the file opened in binary mode. Here's an article that talks about reading a binary file:
http://www.cplusplus.com/articles/DzywvCM9/
Try opening files in hex editors or programs like Notepad++ if you're curious about what's inside them.