Gumbo Parser - Get innerText of an element
Hi!
I am beginner in "Gumbo Parser Library" and I have seen the examples of that but still I have issues with getting the text of the nodes.
How can I do that?!
Thanks.
My HTML code is like this:
1 2 3 4
|
<dl class="dl-horizontal">
<dt>Email Address</dt>
<dd>HelloWorld@Gmail.Com</dd>
</dl>
|
I need to get "HelloWorld@Gmail.Com" text.
I wrote this code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
|
#include "gumbo.h"
#include <stdio.h>
#include <string>
#include <fstream>
using namespace std;
static void find_email(GumboNode* node)
{
if (node->type == GUMBO_NODE_ELEMENT)
{
if (node->v.element.tag == GUMBO_TAG_DT)
{
GumboVector* dt_children = &node->v.element.children;
GumboNode* dt_first_child = static_cast<GumboNode*>(dt_children->data[0]);
if (dt_first_child->type == GUMBO_NODE_TEXT && string(dt_first_child->v.text.text).compare("Email Address") == 0)
{
printf("FOUND EMAIL ADDRESS");
}
}
GumboVector* children = &node->v.element.children;
for (unsigned int i = 0; i < children->length; ++i)
{
find_email(static_cast<GumboNode*>(children->data[i]));
}
}
}
string read_file(const char* filename)
{
ifstream in(filename, ios::in | ios::binary);
if (!in) {
printf("File '%s' not found.", filename);
exit(EXIT_FAILURE);
}
string contents;
in.seekg(0, ios::end);
contents.resize(in.tellg());
in.seekg(0, ios::beg);
in.read(&contents[0], contents.size());
in.close();
return contents;
}
int main(int argc, const char *argv[])
{
if (argc != 2)
{
printf("Usage: LetsParse <filename>\n");
exit(EXIT_FAILURE);
}
const char* filename = argv[1];
string contents = read_file(filename);
GumboOutput* output = gumbo_parse(contents.c_str());
find_email(output->root);
gumbo_destroy_output(&kGumboDefaultOptions, output);
return 0;
}
|
But I don't know how to get the next element?!
if (node->v.element.tag == GUMBO_TAG_DT) |
<dd> is a sibling of <dt> not a child. You need to find the <dl> node and then the second child.
Very very very thanks !
Topic archived. No new replies allowed.