Questions about Boost::Regex
Jul 2, 2009 at 1:31am UTC
Hi everyone!
My goal is: to read the spesific part of the text from whole text.
E.g: To get all links in the html page.
If I use CLR Windows Forms Application (.Net) it would be something like this:
1 2 3 4 5 6 7
using namespace System::Text::RegularExpressions;
...
Regex^ re( "<a href=\"([^\"]*)\"" );
MatchCollection^ matches = re->Matches(fulls); //fulls contains html page source
for each (Match^ match in matches){
MessageBox::Show(match->Groups[1]->Value);
}
How can I do the same thing in MFC using Boost::regex?
I think I can use regex_token_iterator for this, but how to get results in CString using regex_token_iterator?
Any help would be greatly appreciated.
Last edited on Jul 2, 2009 at 6:18pm UTC
Jul 2, 2009 at 11:19pm UTC
I found a solution(but I don't think it's the best one):
1 2 3 4 5 6 7 8 9 10
CString fulls=html;//the full page source
boost::tregex re(L"<a href=\"([^\"]*)\"" );
boost::tregex_iterator begin(fulls.GetString(),fulls.GetString()+fulls.GetLength(),re), end;
CString sub;
for (;begin!=end;++begin){
boost::tmatch const &what = *begin;
sub=CString(what[0].first,what[0].length());
if (sub!="" ) //to avoid the empty results
AfxMessageBox(sub);
}
I'm getting empty results for each charachter where regex doesn't match. That's why I'm using if(sub!="") in the 8th line.
Does anyone have a better solution?
Jul 3, 2009 at 12:35pm UTC
Check what[0].matched before assigning the string.
Jul 3, 2009 at 6:10pm UTC
Thanks, PanGalactic.
I tried what[0].matched instead of if (sub!=""), everything works fine, except I'm getting an empty result in the end.
From Boost::regex documentation:
m[0].matched --
true if a full match was found, and false if it was a partial match (found as a result of the match_partial flag being set ).
Since I'm not using flag match_partial, I think, there's no need to check for what[0].matched.
I figured it out: I just need to use match_not_null(match can't be null) flag and I don't need to check for empty results.
Jul 6, 2009 at 2:05am UTC
One more question, I need to read everything(even nested in tags) between the starting and ending of a tag. I need a regex pattern to do that.
So for example I have a html page source:
...
<div id="ReadEverythingInThisTag">
<b>blah-blah-blah</b>
<i>blah-blah-blah</i>
<div>adsasd</div>
</div>
...
I want to read everything marked in bold.
This is a pattern I got:
<div id=\"ReadEverythingInThisTag\">(.*(<[^>/]*>.*(</[^>]*>)*).*)</div>
It works fine with that source above, but when it comes with line breaks(<br>), lines(<hr>) it's failing:
<div id="ReadEverythingInThisTag">
<b>blah-blah-blah</b><br>
<i>blah-blah-blah</i><hr>
<div>adsasd</div>
</div>
Any ideas?
Last edited on Jul 6, 2009 at 2:07am UTC
Jul 6, 2009 at 6:12am UTC
Ok, I figured it out:
1 2 3
CString altag=L"((<img[^>]*>)*?(<br[^>]*?>)*?(<hr[^>]*?>)*?)*?" ;
CString Parrern=L"<div>(?<id>" +altag+L".*?(<[^>/]*>" +altag+L".*(</[^>]*>)*)*" +
altag+L".*?)</div>" ;
Now I got another one :) Please, someone help me with this problem:
I want to give a name to group in regex:
Using .net it would be like this:
"(?<letters>[a-zA-Z]*)(?<numbers>[0-9]*)"
I could access those groups like this:
1 2
String^ letters=Match->Groups["letters" ]->Value;
String^ numbers=Match->Groups["numbers" ]->Value;
How can I do that using Boost::Regex libraries?
Last edited on Jul 6, 2009 at 6:12am UTC
Topic archived. No new replies allowed.