Are you writing your program on Linux or windows? If Linux, there is a dictionary file you can reference that contains all of the words in the English language. I'm not sure if windows has an equivalent file, but I suspect it doesn't. You could generate a word, then check if it's in the list before displaying it. Or just generate a random number and use the word at that index in the file.
I guess I don't understand what you're actually trying to do here... What is your end goal? To randomly generate real English words? Or is the goal to parse web pages looking for specific words? Those two tasks are very very different. You wouldn't want to parse a webpage to verify that a word is real. Also, it would make no sense to parse a webpage looking for a word that you have generated randomly. The word (which may or may not even be real) may or may not exist on the page. I'm not sure what information could be gathered from that exercise. If, on the other hand, you have a predefined list of specific words you want to search for, that's another story.
OK... I think I'm still at a loss as to what you are trying to do here. I assume this has changed from trying to generate real words, to an experiment in using some web libraries? Checking if randomly generated strings of characters exist on a webpage doesn't seem to have any informational value, so I can only assume you are only doing that to learn how to query web pages from a c++ program? If that is the case, I would recommend a language that would be better suited for that task. Python, Perl, or even Java might be a better choice for web parsing. Your original request appeared to be trying to randomly generate real words, and if that is still your ultimate goal, reading a webpage to verify your words is the wrong way to do it.
Again, I would recommend using a different language that is better suited for this type of task (Python, Perl, Java...). Reading/parsing a webpage is not a trivial task with c++, but this post might give you some ideas on how to do it: http://www.cplusplus.com/forum/windows/36638/
As for eliminating all of the HTML, scripts, and styles so you just return the text, you'll have a bit of work to parse the stream of data coming back.
It looks like your best bet is to use curl to get the html, and and xml parser to get the content of the body tag. Once you have the content, then all you are left with is a string that you have to split into words.