I would like to create my own webcrawler project but I am unsure where to start. I won't have any problem coding the socket portion but I am stumped on how I should proceed once I've made a socket connection. To start I just want to read in the HTML, find all links in the HTML and follow those links. I'll determine my stopping criteria later. The problem I have is that I'm not sure if there is a library function for reading an HTML file once I establish a connection. Do I have to send an HTTP 'GET' from within my program and read the web page or is there another way to read the HTML? Once I read it in I can search for and follow the links, I'm just not sure how to get an HTML page read into the program. Thanks in advance for any advice.
Dirk, if you have an example I would appreciate it. Unless of course you are saying it is as easy as encoding and sending 'GET index.html' through my socket connection to the webserver and I receive my webpage for parsing.