Questions about Boost::Regex

Forum

Forum
Beginners
Questions about Boost::Regex

Questions about Boost::Regex

Hi everyone!

My goal is: to read the spesific part of the text from whole text.
E.g: To get all links in the html page.

If I use CLR Windows Forms Application (.Net) it would be something like this:

using namespace System::Text::RegularExpressions;
...
Regex^ re( "<a href=\"([^\"]*)\"" );
MatchCollection^ matches = re->Matches(fulls); //fulls contains html page source
for each (Match^ match in matches){
  MessageBox::Show(match->Groups[1]->Value);
}

How can I do the same thing in MFC using Boost::regex?

I think I can use regex_token_iterator for this, but how to get results in CString using regex_token_iterator?

Any help would be greatly appreciated.

Last edited on

msn92 (13)

I found a solution(but I don't think it's the best one):

CString fulls=html;//the full page source
boost::tregex re(L"<a href=\"([^\"]*)\"");
boost::tregex_iterator begin(fulls.GetString(),fulls.GetString()+fulls.GetLength(),re), end;
CString sub;
for (;begin!=end;++begin){
	boost::tmatch const &what = *begin;
	sub=CString(what[0].first,what[0].length());
	if (sub!="")  //to avoid the empty results
	AfxMessageBox(sub);
}

I'm getting empty results for each charachter where regex doesn't match. That's why I'm using if(sub!="") in the 8th line.

Does anyone have a better solution?

PanGalactic (1658)

Check what[0].matched before assigning the string.

msn92 (13)

Thanks, PanGalactic.

I tried what[0].matched instead of if (sub!=""), everything works fine, except I'm getting an empty result in the end.

From Boost::regex documentation:

m[0].matched --
true if a full match was found, and false if it was a partial match (found as a result of the match_partial flag being set).

Since I'm not using flag match_partial, I think, there's no need to check for what[0].matched.

I figured it out: I just need to use match_not_null(match can't be null) flag and I don't need to check for empty results.

msn92 (13)

One more question, I need to read everything(even nested in tags) between the starting and ending of a tag. I need a regex pattern to do that.

So for example I have a html page source:

...
<div id="ReadEverythingInThisTag">
<b>blah-blah-blah</b>
<i>blah-blah-blah</i>
<div>adsasd</div>
</div>
...

I want to read everything marked in bold.

This is a pattern I got:

It works fine with that source above, but when it comes with line breaks(<br>), lines(<hr>) it's failing:

<div id="ReadEverythingInThisTag">
<b>blah-blah-blah</b><br>
<i>blah-blah-blah</i><hr>
<div>adsasd</div>
</div>

Any ideas?

Last edited on

msn92 (13)

Ok, I figured it out:

1
2
3

CString altag=L"((<img[^>]*>)*?(<br[^>]*?>)*?(<hr[^>]*?>)*?)*?";
CString Parrern=L"<div>(?<id>"+altag+L".*?(<[^>/]*>"+altag+L".*(</[^>]*>)*)*"+
         altag+L".*?)</div>";

Now I got another one :) Please, someone help me with this problem:

I want to give a name to group in regex:
Using .net it would be like this:

"(?<letters>[a-zA-Z]*)(?<numbers>[0-9]*)"

I could access those groups like this:

1
2

String^ letters=Match->Groups["letters"]->Value;
String^ numbers=Match->Groups["numbers"]->Value;

How can I do that using Boost::Regex libraries?

Last edited on

Topic archived. No new replies allowed.

C++

Forum

Questions about Boost::Regex