Help with using Curl to get text of webpage

I'm trying to make a program that will use Curl to retrieve just the text from a website and store it as a string. I'm trying to look at the curl manual to figure out what I need to use, but I'm confused since I'm not familiar with HTML terms. Could someone help me with the lib functions I would need to make this program?
Last edited on
This would be the general idea ( untested, adapted from http://curl.haxx.se/libcurl/c/getinmemory.html ):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>
#include <string>
#include <curl/curl.h>
#include <memory>

extern "C" std::size_t append_to_string( void* contents, std::size_t size, std::size_t nmemb, void* pstr )
{
    const std::size_t sz = size * nmemb ;
    const char* cstr = static_cast<const char*>(contents) ;
    std::string& str = *static_cast< std::string* >(pstr) ;
    for( std::size_t i = 0 ; i < sz ; ++i ) str += cstr[i] ;
    return sz ;
}

int main()
{
    curl_global_init(CURL_GLOBAL_ALL); // wrap in an RAII shim
    CURL* curl_handle curl_easy_init() ; // use std::unique_ptr with a custom deleter
    std::string page ;

    curl_easy_setopt( curl_handle, CURLOPT_URL, "http://www.cplusplus.com/forum/general/151685/"); // url
    curl_easy_setopt( curl_handle, CURLOPT_WRITEFUNCTION, append_to_string ); // call 'append_to_string' with data

    // pass the address of string 'page' to the callback 'append_to_string'
    curl_easy_setopt( curl_handle, CURLOPT_WRITEDATA, std::addressof(page) );
    curl_easy_setopt( curl_handle, CURLOPT_USERAGENT, "libcurl-agent/1.0"); // user-agent (optional)

    const auto result = curl_easy_perform(curl_handle) ; // get the page
    if( result == CURLE_OK ) std::cout << page << '\n' ;
    else std::cerr << "**** error: " << curl_easy_strerror(res) << '\n' ;

    curl_easy_cleanup(curl_handle);
    curl_global_cleanup();
}
Thanks for the reply.

I got it to work with a few adjustments. Here's the code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include "StdAfx.h"
#include <iostream>
#include <string>
#include <curl/curl.h>
#include <memory>

extern "C" std::size_t append_to_string( void* contents, std::size_t size, std::size_t nmemb, void* pstr )
{
    const std::size_t sz = size * nmemb ;
    const char* cstr = static_cast<const char*>(contents) ;
    std::string& str = *static_cast< std::string* >(pstr) ;
    for( std::size_t i = 0 ; i < sz ; ++i ) str += cstr[i] ;
    return sz ;
}

int main()
{
    curl_global_init(CURL_GLOBAL_ALL); // wrap in an RAII shim
    CURL* curl_handle = curl_easy_init(); // use std::unique_ptr with a custom deleter
    std::string page ;
	CURLcode res;
	//curl_handle = curl_easy_init();
    curl_easy_setopt( curl_handle, CURLOPT_URL, "http://www.cplusplus.com/forum/general/151685/"); // url
    curl_easy_setopt( curl_handle, CURLOPT_WRITEFUNCTION, append_to_string ); // call 'append_to_string' with data

    // pass the address of string 'page' to the callback 'append_to_string'
    curl_easy_setopt( curl_handle, CURLOPT_WRITEDATA, std::addressof(page) );
    curl_easy_setopt( curl_handle, CURLOPT_USERAGENT, "libcurl-agent/1.0"); // user-agent (optional)

    const auto result = curl_easy_perform(curl_handle) ; // get the page
    if( result == CURLE_OK ) std::cout << page << '\n' ;
    else std::cerr << "**** error: " << curl_easy_strerror(res) << '\n' ;

    curl_easy_cleanup(curl_handle);
    curl_global_cleanup();


	system("pause");
}

It doesn't seem to work for what I need to use it for though. I'm trying to make a program that will download the first image of Google image search. To do this, I was planning on using Google search API. Something like this:
https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=dog
After I got the text off of this page, I was going to separate it by commas into a string array. Then, I was going to download the image by using the 8th string in the array which holds the url of the image. However, whenever I try to put a Google search API link into the code posted above, I get "****error: No Error". Is this because the format of the url is not supported by curl?

UPDATE:
I got it to work. The solution was simple. I just used this URL format instead:
http://www.ajax.googleapis.com/ajax/services/search/images?v=1.0&q=dog
Last edited on
Topic archived. No new replies allowed.