parsing formatting data

closed account (4Gb4jE8b)
.doc, .rtf, and .txt all have different formats, especially if saved in different programs. I'm writing a program that can handle unformatted types of all of the above, but when formatted in (for example, but not limited to) Microsoft Word or Wordpad, a lot of formatting data is included with the save file.

Is there a common method for parsing this data? Or better yet (in the case of a .doc) is there a way to allow the console to read this as it was written in Word or Wordpad? I'm alright with using an API and my program is specifically for windows, so lack of portability won't kill me.

Thanks for any possible help!
It depends entirely on the originating program.

Microsoft's document formats are "closed source" so what we know about them is rather limited. You are better off using Word to export your .doc/.docx file to something you can more easily read.

Google around RTF for information on that.

Good luck!
closed account (4Gb4jE8b)
oh joy!

The program I'm working on is a text parser... i was hoping i could make it compatible with .doc and then .docx... well i'm guessing not, as the formatting is extremely hard to be able to parse accurately.

Thanks again Duoas
Topic archived. No new replies allowed.