.doc, .rtf, and .txt all have different formats, especially if saved in different programs. I'm writing a program that can handle unformatted types of all of the above, but when formatted in (for example, but not limited to) Microsoft Word or Wordpad, a lot of formatting data is included with the save file.
Is there a common method for parsing this data? Or better yet (in the case of a .doc) is there a way to allow the console to read this as it was written in Word or Wordpad? I'm alright with using an API and my program is specifically for windows, so lack of portability won't kill me.
Microsoft's document formats are "closed source" so what we know about them is rather limited. You are better off using Word to export your .doc/.docx file to something you can more easily read.
The program I'm working on is a text parser... i was hoping i could make it compatible with .doc and then .docx... well i'm guessing not, as the formatting is extremely hard to be able to parse accurately.