RegEx problem

Dec 11, 2020 at 7:31am
Hey All,

I'm just starting into regular expressions in C++ and I've noticed a couple problems. I spent 20+ years using them in perl so there are certain things I "just expect". And I'm not finding them.

Specifically, I expect a regex to have a "multiline" mode where it will match beyond the first newline included in the target string. I can't find one.

Similarly, I have found that no search will proceed past a null character in the buffer either (possible with std::strings).

As a more minor complaint (because there is a workaround) I don't find a case-insensitive mode either. Am I missing something?

BTW the reason these things matter is I'm about to write a file searching utility that wants to be able to use regexes. If I can't solve the problem I will have to break the string into pieces at newlines or nulls and search the pieces individually. Ugh.

TIA,

Lars
Dec 11, 2020 at 9:44am
icase: Look at 'ninth' in the example here: http://www.cplusplus.com/reference/regex/basic_regex/basic_regex/

What is "perl multiline"?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <iostream>
#include <string>
#include <regex>

int main ()
{
  using namespace std::regex_constants;
  std::regex ninth ("lo\\nw", ECMAScript | icase );

  std::string subject = "Hello\nWorld";
  std::cout << subject << std::endl;
  std::string replacement = "yup";

  std::cout << std::regex_replace (subject, ninth, replacement);
  std::cout << std::endl;

  return 0;
}
Dec 11, 2020 at 4:58pm
C++17 added a multiline constant to basic_regex.
https://en.cppreference.com/w/cpp/regex/basic_regex
Dec 11, 2020 at 6:40pm
Thanks!
Dec 11, 2020 at 7:58pm
Not exactly the perl approach, but it'll work. In perl you do it with modifiers as you apply the regex, e.g. if ($x =~ /foo/i) { dosomething(); } would get you case-insensitive.
Dec 11, 2020 at 9:16pm
I would say that it is quite close.
1
2
/foo/i
std::regex bar("foo", icase);

A difference here is that perl has "unnamed literal constant", where C++ prefers to create "named object". Both have the regular expression and modifier (flags).
The literal constant is used directly in match expression, while the named regex can be used in multiple expressions.

Use of unnamed temporary is possible too:
1
2
if ( std::regex_match( x, std::regex("foo", icase) ))
   { dosomething(); }

So the real difference is that perl has binary operator =~ but C++ uses function regex_match()
Dec 12, 2020 at 2:54am
Furry Guy: haven't been able to get multiline to work. Here's my test code:
<code>
#include <iostream>
#include <string>
#include <regex>

int main()
{
std::string str("\nabcd");
std::regex r("^abc", std::regex::multiline);

std::smatch m;
std::regex_search(str, m, r);

for (auto v : m)
std::cout << v << std::endl;
}
</code>

I get "E0135 class "std::basic_regex<char, std::regex_traits<char>>" has no member "multiline" RegexTest C:\projects\RegexTest\RegexTest.cpp 8


and yes, I did switch it to C++ 17. Am I doing something wrong?

TIA,

Lars
Dec 12, 2020 at 5:30am
Keskiverto: yes, I don't find the C++ syntax problematic, except where it doesn't seem to work (see above). The perl syntax is nicely terse but without storing a regex in a var (which works fine) you can't otherwise reuse one. E.g. if ($x =~ /$regex/) { dosomething(); }
Dec 12, 2020 at 11:09am
Which version of compiler do you have and how complete its C++17 support is?

Could there be something like: https://developercommunity.visualstudio.com/content/problem/268592/multiline-c.html
Dec 12, 2020 at 5:35pm
Keskiverto: I'm using VS2019 16.8.1.
Last edited on Dec 12, 2020 at 5:35pm
Dec 12, 2020 at 5:41pm
Keskiverto: it does seem to be multiline by default, but it still won't read through a zero.
Dec 13, 2020 at 1:38am
Reported to Microsoft 12/12/2020.
Topic archived. No new replies allowed.