You haven't said what platform or operating system you're working with? Anyway...
The best known library for this kind of thing is cURL
http://curl.haxx.se/
You should find that their "easy" interface (their quotes!) should be enough for your needs.
http://curl.haxx.se/libcurl/c/libcurl-easy.html
If you download the appropriate development libraries, for your platform/operating system, you will find it comes with docs and samples, including the one I customized to download this thread (see below).
http://curl.haxx.se/download.html
I have libcurl-7.19.3-win32-ssl-msvc, which is the latest available pre-built development package listed under Visual C++. If you need a newer version for Windows, you will probably have to build it yourself.
If you're using Linux, I would expect libcurl would be available via your system's package manager.
If you are using Windows, another possility is WinINet. See this thread from last year for details.
Get data from a Internet file into a char
http://m.cplusplus.com/forum/windows/62128/
Note that the program discussed in this thread is reading the downloaded content into a set of char buffers. If you're writing the data back to file immediately, you will be able to modify the code to reuse a single buffer, rather than creating a new one each iteration of the loop.
Andy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
|
#include <iostream>
#include <cstdio>
#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>
size_t write_stream(char* ptr, size_t size, size_t nmemb, void* stream) {
return fwrite(ptr, size, nmemb, reinterpret_cast<FILE*>(stream));
}
int main() {
CURLcode res = CURLE_OK;
const char* url = "http://www.cplusplus.com/forum/beginner/105882/";
const char* file_name = "thread-beginner-105882.html";
std::cout << "download : " << url << std::endl;
std::cout << "to file : " << file_name << std::endl;
CURL* curl = curl_easy_init();
if (0 != curl) {
FILE* fp = fopen(file_name,"wb");
if(0 != fp) {
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_stream);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
res = curl_easy_perform(curl);
if(CURLE_OK == res) {
std::cout << "curl_easy_perform succeeded" << std::endl;
} else {
std::cout << "curl_easy_perform failed" << std::endl;
}
fclose(fp);
}
curl_easy_cleanup(curl);
}
return 0;
}
|
Contents of file thread-beginner-105882.html
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Taking information from a webpage - C++ Forum</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico">
<link rel="stylesheet" type="text/css" href="/v315/main.css">
<script src="/v315/main.js" type="text/javascript"></script>
</head>
<body>
...
</div><div id="I_content"><h3><div class="C_ico question" title="question"> </div>
Taking information from a webpage</h3><span id="CH_edttl"></span>
<span class="rootdatPost" title="105882,root,0,-1,2,0"></span><div id="CH_PostList">
<div class="C_forPost" id="msg572250"><span title="572250,64449,1023,106,1"></span>
<div class="box">
<div class="boxtop">
<div class="dwhen"><a href="#msg572250" title="Link to this post">
<img src="/img/link.png" width="16" height="8"></a> ...
<div class="dwho"><a href="/user/Jonas_Wingren/"><b>Jonas Wingren</b> (106)</a></div>
</div>
<div class="dwhat" colspan="2" id="CH_i572250">
Hi!<br>
I want to be able to read a webpage as an ordinary text file. (probably using some kind of stream)<br>
<br>
...
<div class="C_forPost" id="msg572309"><span title="572309,62677,1023,2514,0"></span>
<div class="box">
<div class="boxtop">
<div class="dwhen"><a href="#msg572309" title="Link to this post">
<img src="/img/link.png" width="16" height="8"></a> ...
<div class="dwho"><a href="/user/andywestken/"><b>andywestken</b> (2514)</a></div>
</div>
<div class="dwhat" colspan="2" id="CH_i572309">
You haven't said what platform or operating system you're working with? Anyway...<br>
<br>
The best known library for this kind of thing is cURL<br>
<a href="http://curl.haxx.se/">http://curl.haxx.se/</a><br>
<br>
... |