Hello all, I have a bit of problem getting my regular expressions to work. I know that the expressions I'm using are correct because I've tested them in various tools and I get the desired results.
However, every time I use these expressions in C++, i'm not getting the desired result. The expressions always return false. I'm not sure if I'm escaping properly and am posting these expressions here so a new set of eyes can look at them and maybe help me with this dilemma.
// this is an end-of-line comment
/* this is an inline comment */
// some psuedo-code
int main()
{
cout << "hello";
return 0;
}
================================================
THIS_IS_A_REPLACEMENT
THIS_IS_A_REPLACEMENT
THIS_IS_A_REPLACEMENT
int main()
{
cout << "hello";
return 0;
}
I'm not as concerned with getting the first expression to work as I am with the second expression. The first expression is to resolve an ambiguity that might exist in nested comments.
The expression is good, however it fails when we have something like this:
1 2 3 4 5 6
someString = "An example comment: /* example */";
zxckjzxlck
// The comment around this code has been commented out.
// /*
some_code();
// */
I think it's almost impossible to write an expression that will be correct 100% of the time. The approach I'm using now is finding all end-of-line comments first and then finding any remaining comments, at this point should only be inline comments, that the first expression might have missed (using the expression Andy game me).
The code can be made a bit more efficient. The regex_replace is currently doing a second search from the beginning of the string; it's for doing search and replace at the same time. And when you loop, you searching from the beginning again.
If you walk the first string and build the new string as a separate variable you can avoid both of these restarts.
My understanding is that a general solution for the comment problem is impossible. But it hasn't stopped people from trying!
When I needed a comment stripper, I wrote it in Python.
You are definitely right. I've seen significant improvement with modifying the structure of the code. About three times as fast in some of the source files I'm parsing.
I'm not sure how familiar you are with PHP but there a nice function token_get_all() [1] in PHP that tokenizes the code. Then you can just do a search like if ( $token[0] === T_COMMENT ) and find all the comments or strings.
I'm going to have to browse the PHP code to find out how to they this.
Sorry, I don't get it. It seems that it goes to all that trouble just because its package does not support the lazy operator.
¿Aren't the two expressions equivalent?
ne555 - I've just tried your expression and it works for my simple test case.
Sorry about my earlier comment; but as you said your expression was "not properly tested", so I chose to go with the other, longer version as people seemed to trust it. And at the time I was looking at the overall problem (the repeated searches, etc) and wanted to minimize unknowns.
Now I know the code is OK, I can try untested expressions without having to worry whether it's the expression of the code that's wrong.