Simple Web Crawler

Dec 12, 2009 at 12:22am
Hi all,

I'm new to C++ but not to programming. I'm trying to create a console application where I would enter the URL of a specific stock/financial message board. The program would then monitor the specific message board for mentions of stocks and keep a tally each time the stock is mentioned. My question is how to open a connection and get the data/html code which can be parsed. I have read up as best I could and it seems the best option is to use win api but I have no idea where to start. I should mention this is not a school project.
Dec 12, 2009 at 12:47am
To open a connection look into WinSock or libraries such as curl. I'm not very good with either (yet) so I can't give you much help, but those are what you need to use to make connections.
Dec 12, 2009 at 2:00am
Why are you making this?
Dec 14, 2009 at 11:28pm
going into winsock might be a little tricky. I'd do what gsingh2011 suggests and use curl with either php or perl. Perl would be my preference because of the regular expressions you might need to cut and chop your results.

You can try http://codediaries.blogspot.com/2009/12/c-winsock-example-using-client-server.html for a winsock tutorial.
Dec 18, 2009 at 10:05am
You should start out simple and learn to write web services using c++. There is a lot of info out there.
Dec 19, 2009 at 12:03am
i built a c++ class that makes easy to download a file or a page using the http protocol...
there is just a problem, the instructions are in italian ;p

anyway just reading the code could be useful for you... or maybe you can just try to use the class (it's under construction but it will work for what you have to do ;) )

http://mamo139.altervista.org/code_viewer.php?id=pastebin/http_download_v0.01.04.h&lang=c++
http://mamo139.altervista.org/code_viewer.php?id=pastebin/http_download_v0.01.04.cpp&lang=c++

example of how to use it in the easiest way (this example does not handle the errors that may occur during the download, if u need i'll show you how to do it):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include "http_download.h" 

int main (){ 

   http_download a; 

   a.initialize("http://neacm.fe.up.pt/pub/videolan/vlc/1.0.3/win32/vlc-1.0.3-win32.exe","vlc.exe",0); 

a.start();

     while(a.get_status() != HD_STATUS_DOWNLOAD_COMPLETED){ 
      Sleep(1000); 
      printf("status: %d\n", a.get_status()); 
   } 

   printf("main(): download completed!\n\n"); 
   getchar(); 
   return 0; 
} 


bye
Last edited on Dec 19, 2009 at 12:04am
Dec 19, 2009 at 12:30am
Unfortunately, that code doesn't compile. It's missing "matrici.h".
Dec 19, 2009 at 1:15pm
Topic archived. No new replies allowed.