Display English words (CurlPP)

Forum

Forum
Beginners
Display English words (CurlPP)

Display English words (CurlPP)

Aug 22, 2014 at 9:02pm

Hello

I first asked this question here:
http://www.cplusplus.com/forum/beginner/141021/
But thought it was worth to make another thread for it.

Someone suggested Markov Chains, but I don't really know how I could implement that in C++.

Anyway, I've got a word generator, but it does only display random characters, I want to make it display real English words.

If no one could explain Markov Chains/other methods it's okay.
But another method I'm thinking of is by comparing it by other websites.

Example:
-if string.generated_word exists on http://wikipedia.com string.generated_work is true.

-Or by just displaying the content of divisions (<div></div>).
Is this possible by using CurlPP? How?

Any method (and easy is best) to achieve this would be awesome! :)

Thanks for reading,
Niely

Last edited on Aug 22, 2014 at 9:03pm

Aug 23, 2014 at 2:12am

discofire (64)

Are you writing your program on Linux or windows? If Linux, there is a dictionary file you can reference that contains all of the words in the English language. I'm not sure if windows has an equivalent file, but I suspect it doesn't. You could generate a word, then check if it's in the list before displaying it. Or just generate a random number and use the word at that index in the file.

Aug 23, 2014 at 10:44am

Nielyboyken (153)

Linux. :)
But the problem is, I don't want to use a dictionary file.

I just want to check if a specific word exists on a website.

Aug 24, 2014 at 2:22am

discofire (64)

I guess I don't understand what you're actually trying to do here... What is your end goal? To randomly generate real English words? Or is the goal to parse web pages looking for specific words? Those two tasks are very very different. You wouldn't want to parse a webpage to verify that a word is real. Also, it would make no sense to parse a webpage looking for a word that you have generated randomly. The word (which may or may not even be real) may or may not exist on the page. I'm not sure what information could be gathered from that exercise. If, on the other hand, you have a predefined list of specific words you want to search for, that's another story.

Aug 24, 2014 at 1:29pm

Nielyboyken (153)

Check if a word exists on a website.

Like this Pseudo code:

random = rand() % 3;
if (random == 1) {
word = word + "a";
} else if (random == 2) {
word = word + "b";
} else if (random == 3) {
word = word + "c";
} else {
//error
}
cout << "Final word: " << word<<endl;
if (word exists on http://wikipedia.com) {
cout << word << result.txt;
}

Of course I need for loops and fstreams but I already have those, this is just a pseudo code to show what I want. :)

Aug 25, 2014 at 2:14am

discofire (64)

OK... I think I'm still at a loss as to what you are trying to do here. I assume this has changed from trying to generate real words, to an experiment in using some web libraries? Checking if randomly generated strings of characters exist on a webpage doesn't seem to have any informational value, so I can only assume you are only doing that to learn how to query web pages from a c++ program? If that is the case, I would recommend a language that would be better suited for that task. Python, Perl, or even Java might be a better choice for web parsing. Your original request appeared to be trying to randomly generate real words, and if that is still your ultimate goal, reading a webpage to verify your words is the wrong way to do it.

Aug 28, 2014 at 11:51am

Nielyboyken (153)

How do you recommend to do it then?

Aug 28, 2014 at 10:08pm

Nielyboyken (153)

Okay, what if I want to take words of a website and display them in the terminal?

Wikipedia.org:

The French revolution of 1789 - 1795 was a real...

C++ Terminal:

The
French
revolution
of
1789
-
1795
was
a
real
...

Is that possible then? :)

Aug 29, 2014 at 9:31pm

discofire (64)

Again, I would recommend using a different language that is better suited for this type of task (Python, Perl, Java...). Reading/parsing a webpage is not a trivial task with c++, but this post might give you some ideas on how to do it:
http://www.cplusplus.com/forum/windows/36638/

As for eliminating all of the HTML, scripts, and styles so you just return the text, you'll have a bit of work to parse the stream of data coming back.

Here is an example of how to do this with a language more suited for this type of task:
http://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html

Aug 29, 2014 at 10:29pm

Lowest0ne (1536)

It looks like your best bet is to use curl to get the html, and and xml parser to get the content of the body tag. Once you have the content, then all you are left with is a string that you have to split into words.

Aug 31, 2014 at 7:42pm

Nielyboyken (153)

^Yes, but how exactly would that be done in C++?

Sep 1, 2014 at 12:18am

discofire (64)

Look in the thread I posted previously. There is an example in there using the curl libraries.

http://www.cplusplus.com/forum/windows/36638/

Sep 1, 2014 at 12:28am

Lowest0ne (1536)

I've been as precise as I can without figuring out how to do it myself. Read some documentation.

Topic archived. No new replies allowed.