Download A Web Page

Sep 2, 2013 at 10:08pm
I would like to download a web page from the internet and save it as an HTML file.
Sep 2, 2013 at 10:44pm
You could use Winsock (Windows Socket Library), which is native component to the Windows SDK therefore it has no dependencies on 3rd party DLLs (Dynamic Link Libraries) or other 3rd party objects therefore it is a great choice to use but Wininet (Windows Internet Library) is also a native component to the Windows SDK and it does the job quicker than Winsock with the addition of half the trouble.

There are lot of articles of how to get source code of a website, but anyway this is the code should do the job of getting source code of a website, although I will give you the easy part to save the buffer into a file using Windows API or using the standard library for Input Output (IO) of files.

Code:
Do be aware I am using Visual C++ using Visual Studio, so I can use #pragma in order to perform a inline-linkage whereas in other compiler types such as GCC and others you may need to manually link it,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#pragma comment(lib,"wininet.lib") //remove if not using VC++.
#include<iostream>
#include<Windows.h>
#include<wininet.h>
#include<cstring>
using namespace std;
int main(){
HINTERNET connect = InternetOpen("MyBrowser",INTERNET_OPEN_TYPE_PRECONFIG,NULL, NULL, 0);
 
   if(!connect){
      cout<<"Connection Failed or Syntax error";
      return 0;
   }
 
HINTERNET OpenAddress = InternetOpenUrl(connect,"http://www.google.com", NULL, 0, INTERNET_FLAG_PRAGMA_NOCACHE|INTERNET_FLAG_KEEP_CONNECTION, 0);
 
   if ( !OpenAddress )
   {
      DWORD ErrorNum = GetLastError();
      cout<<"Failed to open URL \nError No: "<<ErrorNum;
      InternetCloseHandle(connect);
      return 0;
   }
 
   char DataReceived[4096];
   DWORD NumberOfBytesRead = 0;
   while(InternetReadFile(OpenAddress, DataReceived, 4096, &NumberOfBytesRead) && NumberOfBytesRead )
   {
           cout << DataReceived;
   }
 
   InternetCloseHandle(OpenAddress);
   InternetCloseHandle(connect);
 
   cin.get();
return 0;
}



You can easily add the buffer either by byte to byte of data into the File or add the complete buffer into the file, the second recommendation is a more suitable one, for real-time purposes as it is atomic either the file is created successfully or fails, therefore it can be easier to do error checking and provides more reliability.
Last edited on Sep 2, 2013 at 10:46pm
Sep 2, 2013 at 11:09pm
Modern websites are partly rendered by JavaScript. Just fetching the HTML doesn't render active parts of the page.
Sep 2, 2013 at 11:11pm
Which version of VC++ do you have?
Sep 2, 2013 at 11:16pm
I tried to run your provided code but I got these errors:

1
2
3
4
1>c:\users\r's\documents\visual studio 2010\projects\html download\html download\html download.cpp(8): error C2664: 'InternetOpenW' : cannot convert parameter 1 from 'const char [10]' to 'LPCWSTR'
1>          Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
1>c:\users\r's\documents\visual studio 2010\projects\html download\html download\html download.cpp(15): error C2664: 'InternetOpenUrlW' : cannot convert parameter 2 from 'const char [22]' to 'LPCWSTR'
1>          Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
Last edited on Sep 2, 2013 at 11:18pm
Sep 2, 2013 at 11:35pm
I found an article that explained it to me. Thank you for your help!
Sep 3, 2013 at 10:46am
The reason those error came, could be due to you do not have Windows SDK or you forgot to link Wininet.lib.
Sep 3, 2013 at 3:38pm
You're trying to pass ANSI strings to the Unicode entrypoints

The safest thing to do is alter you code to use the ANSI entrypoints directly. That is, use

InternetOpenA rather than InternetOpen
InternetOpenUrlA rather than InternetOpenUrl

(this rule applies to (almost) all WinAPI functions which take one or more string parameters.)

Andy

PS WinAPI "functions" which take strings are almost always actually macros that evaluate to the ANSI or Unicode (or Wide, hence the W) version of the function. Whether or not it evaluates to the -A or -W version of the function is controlled by the define UNICODE, which is set in Visual Studio via a project's "Character Set" property.

If you just plan to use ANSI chars the whole time, the easiest thing is to just use the -A forms of the functions directly and be done with it.

Or you need to read up about TCHARS, etc. (more macros and typedefs which swap between ANSI and Unicode.)
Last edited on Sep 3, 2013 at 3:39pm
Sep 3, 2013 at 10:35pm
Regarding "Zin Byte": how do I link the library? I am really new to C++ and that would help in the future.
Sep 4, 2013 at 11:51am
Hi,

Are you using VS (Visual Studio) if so just copy the above code and it should work.

Or else you need to link the libraries manually: https://www.youtube.com/watch?v=blZNVKYxyIY

(Dev C++)

The same rules applies for other GCC compilers and MinGW compilers.
Last edited on Sep 4, 2013 at 11:52am
Topic archived. No new replies allowed.