I am trying to parse the following URL using the cURL library:
www.ncbi.nlm.nih.gov/nucleotide/? term = Anthoxanthum[organism] AND 2003/7/25:2005/12/27[Publication Date]&format=text
but cURL returns xml (the default, not text I've asked for).
I'm using this code line: curl_easy_setopt(curl, CURLOPT_URL, URL.c_str());
Here's the wierd thing: when I replace the "URL.c_str()" above with the actual text of the web search I want to do, it works fine. Also, if I paste in the URL from fout<<URL, that works fine in a browser.
Seems to me it's a c_str() problem, maybe the "&" or "="? but I can't figure it out, so I turn to the cplusplus forum for their usual wisdom.
The URL works fine (pasted into browser), also works fine if I output the string to a text file and paste that.
The returned cURL data is in the desired text form when I explicitly define the URL: curl_easy_setopt(curl, CURLOPT_URL, "www.ncbi.nlm.nih.gov/nucleotide/? term = Anthoxanthum[organism] AND 2003/7/25:2005/12/27[Publication Date]&format=text")
but if I say:
1 2 3
URL= "www.ncbi.nlm.nih.gov/nucleotide/? term = Anthoxanthum[organism] AND
2003/7/25:2005/12/27[Publication Date]&format=text";
curl_easy_setopt(curl, CURLOPT_URL, URL.c_str());
Is the string URL still in scope when you call curl_easy_perform (or whatever)??
(A string literal is stored in the const segment of your exe, so it will never be deallocated. But a string will be destroyed as soon as it goes out of scope, invalidating the (const) char* returned by c_str().)
Andy
PS Not directly related to the c_str()/char* problem, but the documention for CURLOPT_URL does say you should specify the scheme (e.g. http://, ftp:://, ldap://, ...) as part of the URL.
CURLOPT_URL
Pass in a pointer to the actual URL to deal with. The parameter should be a char * to a zero terminated string which must be URL-encoded in the following format:
for (int j=0; j<number_of_ids; ++j)
{
working = "www.ncbi.nlm.nih.gov/nuccore/" + genbank_id[j] +"?report=fasta&format=text";
URL.push_back(working);
fout<<URL[j]<<endl;
}
cout<<"URL vector populated"<<endl;
//obtain url as FASTA
for (int j = 0; j<(int)URL.size(); ++j)
{
CURL *curl;
CURLcode res;
string readBuffer;
curl = curl_easy_init();
if(curl)
{
curl_easy_escape(curl, URL[j].c_str(),0);
fout<<URL[j].c_str()<<endl;
curl_easy_setopt(curl, CURLOPT_URL, URL[j].c_str());
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L); //follow redirection
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
// Perform the request, res will get the return code
res = curl_easy_perform(curl);
// Check for errors
if(res != CURLE_OK)
{
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(res));
}
curl_easy_cleanup(curl);
//curl_free (curl);
}
//Populate FASTA vector from readBuffer
FASTA.push_back(readBuffer);
DNA.push_back(readBuffer);
}
cout<<"FASTA strings populated into FASTA and DNA vector"<<endl;
I know the URL is good because when i paste this in to a browser, I get my data: (where FJ817486 is the first genbank ID) http://www.ncbi.nlm.nih.gov/nuccore/FJ817486?report=fasta&format=text
The webpage I'm trying to access is just straight-up old fashioned text. I don't think it's a java issue, because I go the first request to work, but this one won't work even if I explicitly encode the url.
do you know how I can check the URL "sent" by libcurl?
Have you inspected the response that you got back? I used your code and got xml back but that sequence data(?) is not included. I could be wrong, but the page that you want is probably a dynamic web page generated by javascript and thus not possible to retrieve with libcurl.
The weblink you post is the sequence I'm trying for, and you're also right in that the xml doesn't seem to include that sequence, or I would just carve it up and get what I wanted.
if I "Inspect Element" on the page you link, I see this:
<script type="text/javascript" src="/portal/js/portal.js?v3%2E5%2E1%2Er392364%3A+Mon%2C+Mar+25+2013+15%3A07%3A09"></script>
which supports your java idea.
So this is just a case of "you can't get there from here"?