Web Crawler

I would like to create my own webcrawler project but I am unsure where to start. I won't have any problem coding the socket portion but I am stumped on how I should proceed once I've made a socket connection. To start I just want to read in the HTML, find all links in the HTML and follow those links. I'll determine my stopping criteria later. The problem I have is that I'm not sure if there is a library function for reading an HTML file once I establish a connection. Do I have to send an HTTP 'GET' from within my program and read the web page or is there another way to read the HTML? Once I read it in I can search for and follow the links, I'm just not sure how to get an HTML page read into the program. Thanks in advance for any advice.

Chilly G

firefox1 (2)

a great project, and I am looking forward the 'how'

dirk (35)

Of course you have to send a GET!
Did you even try it?
Do you need an example?

chillyg69 (2)

Dirk, if you have an example I would appreciate it. Unless of course you are saying it is as easy as encoding and sending 'GET index.html' through my socket connection to the webserver and I receive my webpage for parsing.

Topic archived. No new replies allowed.