Curl Passing data to a simple function

Forum

Forum
UNIX/Linux Programming
Curl Passing data to a simple function

Curl Passing data to a simple function

Below is all the code to date. I need to do much modifications to it so
it will run with my program and this works. Thank god.
The only problem is I need to know how to pass a int to the write_data
functions, so I can load the data into the proper array number.
eg. Url_Data_Array[i].Memory get set up with the memory I have been
trying to load.

There is only two lines of code you need to concern your self with and
they are both have ~~~~~~~~~ in front of them. I need the
curl_easy_setopt(eh, CURLOPT_WRITEFUNCTION, write_data); to pass an int
to the static size_t write_data(char *ptr, size_t size, size_t nmemb,
void *stream);

I know what the char *ptr, size_t size, size_t nmemb, do as I wrote the
function to load it to memory. However I do not know what the void
*stream is for as I have not used it at all. Is this where I could pass
a structure to the function with additional information. I have been at
this for at least 3 months and this is the same stumbling block I come
to over and over again. Weather I use pthreads or any other form of
multiple file pulling.

If I can just communicate with this dumb function it would bring me to
my next step, it tells me in Curl that the static size_t write_data(char
*ptr, size_t size, size_t nmemb, void *stream); has to be written with
those parameters. God am lost any help would be great. However I can
change the name of the function. Could I just add an other void * or
something.

If you can get me through this it would be very helpful.

Thank you,

Donald

#include <errno.h>
#include <stdlib.h>
#include <string.h>
#ifndef WIN32
#  include <unistd.h>
#endif
#include <curl/multi.h>

static const char *urls[] = {
  "http://www.microsoft.com",
  "http://www.opensource.org",
  "http://www.google.com",
  "http://www.yahoo.com",
  "http://www.ibm.com",
  "http://www.mysql.com",
  "http://www.oracle.com",
  "http://www.ripe.net",
  "http://www.iana.org",
  "http://www.amazon.com",
  "http://www.netcraft.com",
  "http://www.heise.de",
  "http://www.chip.de",
  "http://www.ca.com",
  "http://www.cnet.com",
  "http://www.news.com",
  "http://www.cnn.com",
  "http://www.wikipedia.org",
  "http://www.dell.com",
  "http://www.hp.com",
  "http://www.cert.org",
  "http://www.mit.edu",
  "http://www.nist.gov",
  "http://www.ebay.com",
  "http://www.playstation.com",
  "http://www.uefa.com",
  "http://www.ieee.org",
  "http://www.apple.com",
  "http://www.sony.com",
  "http://www.symantec.com",
  "http://www.zdnet.com",
  "http://www.fujitsu.com",
  "http://www.supermicro.com",
  "http://www.hotmail.com",
  "http://www.ecma.com",
  "http://www.bbc.co.uk",
  "http://news.google.com",
  "http://www.foxnews.com",
  "http://www.msn.com",
  "http://www.wired.com",
  "http://www.sky.com",
  "http://www.usatoday.com",
  "http://www.cbs.com",
  "http://www.nbc.com",
  "http://slashdot.org",
  "http://www.bloglines.com",
  "http://www.techweb.com",
  "http://www.newslink.org",
  "http://www.un.org",
};

#define MAX 10 /* number of simultaneous transfers */
#define CNT sizeof(urls)/sizeof(char*) /* total number of transfers to
do */

// delcare char* for loading in the next url to crawl
char *UrlAddress[50];
//declare the thread_data that the tread will pass to the function to
process
struct Url_Data
{
    int  Url_ID;
    char *UrlAddress;
    char *Memory;
    size_t UrlConnectionHtmlBody_size;
    char *RedirectAddress;
    char *IPAddress;
    long HttpResponse;
};
// declare an array of the Thread_data so you have a differnt var for
each thread
struct Url_Data Url_Data_Array[50];

char* memory;
size_t UrlConnectionHtmlBody_size;

~~~~~~~~~~~~~~~~~~~static size_t write_data(char *ptr, size_t size,
size_t nmemb, void *stream);

~~~~~~~~~~~~~~~~~~~static size_t write_data(char *ptr, size_t size,
size_t nmemb, void *stream)
{
    size_t mem;
    //increase the memory buffer size being held
    mem = size * nmemb;
    // set the sizt_t to know how long the char* is
    UrlConnectionHtmlBody_size += mem;
    if (mem>0)
    {
        memory = (char*)realloc(memory, UrlConnectionHtmlBody_size);
    }
    else
    {
        memory = (char*) malloc(UrlConnectionHtmlBody_size);
    }
    // store the data
    if (mem)
    {
        memcpy(&(memory[UrlConnectionHtmlBody_size-mem]), ptr, mem);
    };
    return mem;
};

static void init(CURLM *cm, int i)
{
  CURL *eh = curl_easy_init();
  CURLcode res;
~~~~~~~~~~~~~~~~  curl_easy_setopt(eh, CURLOPT_WRITEFUNCTION,
write_data);
  curl_easy_setopt(eh, CURLOPT_HEADER, 0L);
  curl_easy_setopt(eh, CURLOPT_URL, urls[i]);
  curl_easy_setopt(eh, CURLOPT_PRIVATE, urls[i]);
  curl_easy_setopt(eh, CURLOPT_VERBOSE, 0L);
  // pointer Redirect Site
        char *ra;
        char *ip;
        long HttpResponse;
        /* get the CURLINFO_HTTP_CONNECTCODE*/
        res = curl_easy_getinfo(eh, CURLINFO_RESPONSE_CODE,
&Url_Data_Array[i].HttpResponse);
        /* ask for the ReDirectAddress*/
        res = curl_easy_getinfo(eh, CURLINFO_REDIRECT_URL,
&Url_Data_Array[i].RedirectAddress);
        // Get the IP address for the web site
        res = curl_easy_getinfo(eh, CURLINFO_PRIMARY_IP,
&Url_Data_Array[i].IPAddress);

  curl_multi_add_handle(cm, eh);
}

int main(void)
{
  CURLM *cm;
  CURLMsg *msg;
  long L;
  unsigned int C=0;
  int M, Q, U = -1;
  fd_set R, W, E;
  struct timeval T;

  curl_global_init(CURL_GLOBAL_ALL);

  cm = curl_multi_init();

  /* we can optionally limit the total amount of connections this multi
handle
     uses */
  curl_multi_setopt(cm, CURLMOPT_MAXCONNECTS, (long)MAX);

  for (C = 0; C < MAX; ++C) {
    init(cm, C);
  }

  while (U) {
    while (CURLM_CALL_MULTI_PERFORM == curl_multi_perform(cm, &U));

    if (U) {
      FD_ZERO(&R);
      FD_ZERO(&W);
      FD_ZERO(&E);

      if (curl_multi_fdset(cm, &R, &W, &E, &M)) {
        fprintf(stderr, "E: curl_multi_fdset\n");
        return EXIT_FAILURE;
      }

      if (curl_multi_timeout(cm, &L)) {
        fprintf(stderr, "E: curl_multi_timeout\n");
        return EXIT_FAILURE;
      }
      if (L == -1)
        L = 100;

      if (M == -1) {
#ifdef WIN32
        Sleep(L);
#else
        sleep(L / 1000);
#endif
      } else {
        T.tv_sec = L/1000;
        T.tv_usec = (L%1000)*1000;

        if (0 > select(M+1, &R, &W, &E, &T)) {
          fprintf(stderr, "E: select(%i,,,,%li): %i: %s\n",
              M+1, L, errno, strerror(errno));
          return EXIT_FAILURE;
        }
      }
    }

    while ((msg = curl_multi_info_read(cm, &Q))) {
      if (msg->msg == CURLMSG_DONE) {
        char *url;
        CURL *e = msg->easy_handle;
        curl_easy_getinfo(msg->easy_handle, CURLINFO_PRIVATE, &url);
        fprintf(stderr, "R: %d - %s <%s>\n",
                msg->data.result, curl_easy_strerror(msg->data.result),
url);
        curl_multi_remove_handle(cm, e);
        curl_easy_cleanup(e);
      }
      else {
        fprintf(stderr, "E: CURLMsg (%d)\n", msg->msg);
      }
      if (C < CNT) {
        init(cm, C++);
        U++; /* just to prevent it from remaining at 0 if there are more
                URLs to get */
      }
    }
  }

  curl_multi_cleanup(cm);
  curl_global_cleanup();

  return EXIT_SUCCESS;
}

donnyb (20)

These functions are not working either

  char *ra;
        char *ip;
        long HttpResponse;
        /* get the CURLINFO_HTTP_CONNECTCODE*/
        res = curl_easy_getinfo(eh, CURLINFO_RESPONSE_CODE,
&Url_Data_Array[i].HttpResponse);
        /* ask for the ReDirectAddress*/
        res = curl_easy_getinfo(eh, CURLINFO_REDIRECT_URL,
&Url_Data_Array[i].RedirectAddress);
        // Get the IP address for the web site
        res = curl_easy_getinfo(eh, CURLINFO_PRIMARY_IP,
&Url_Data_Array[i].IPAddress);

All I want to do is pull this information fomr the net, does anybody know a good lib to work with or something this cURL is driving me nuts. I need to pull multiple ones down because of DNS resolution can slow down the program.

PanGalactic (1658)

curlpp (C++ API) may make the code easier. That's the only API I've used and it worked well for my needs. I've not attempted to load multiple sites like you are though.

Galik (2254)

This is pretty much what I use:

#include <string>
#include <iostream>
#include <sstream>
#include <curl/curl.h>

static size_t http_write(void* buf, size_t size, size_t nmemb, void* userp)
{
	if(userp)
	{
		std::ostringstream* oss = static_cast<std::ostringstream*>(userp);
		std::streamsize len = size * nmemb;
		oss->write(static_cast<char*>(buf), len);
		return nmemb;
	}

	return 0;
}

std::string get_html_page(const std::string& url, long timeout = 0)
{
	CURL* curl = curl_easy_init();

	std::ostringstream oss;

	curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &http_write);
	curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 1L);
	curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
	curl_easy_setopt(curl, CURLOPT_FILE, &oss);
	curl_easy_setopt(curl, CURLOPT_TIMEOUT, timeout);
	curl_easy_setopt(curl, CURLOPT_URL, url.c_str());

	curl_easy_perform(curl);
	curl_easy_cleanup(curl);

	return oss.str();
}

int main()
{
	std::string html = get_html_page("http://www.google.com");

	std::cout << html << std::endl;

	return 0;
}

Last edited on

Topic archived. No new replies allowed.