C++ regex clarification

Jul 14, 2020 at 6:54pm
Can you please clarify if this is the correct approach to retrieve values of "x" and "y" from a given string?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <string>
#include <regex>

int main()
{
    int x;
    int y;
    //This regex may not be accurate yet...
    const std::regex re("[x|y]=[0-9]");
    std::string in;
    in = "<Point x="0" y="0" />";
    
    std::smatch sm;
    if (std::regex_match(in, sm, re))
    {
        //My question is, is this the correct approach to retrieve x and y values?
        x = std::stod(sm[0]);
        y = std::stod(sm[1]);
    }
    
}
Jul 14, 2020 at 7:44pm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <string>
#include <regex>

int main()
{
    int x, y;

    // You need to use capture groups (parens) to capture the subexpressions you want
    // You need to use + to allow "one or more" digits.
    const std::regex re("([xy])=\"([0-9]+)\"");

    // double quotes within double quotes need to be escaped with a backslash.
    std::string text = "<Point x=\"0\" y=\"12\" />";

    // Use regex_search and an iterator to search for all matches.
    std::smatch sm;
    for (auto it = text.cbegin();
         std::regex_search(it, text.cend(), sm, re);
         it = sm[0].second)
    {
        // The first capture group ([1]) is our variable name, the second ([2]) is its value
        if (sm[1] == "x")
            x = stoi(sm[2]);
        else if (sm[1] == "y")
            y = stoi(sm[2]);
        else
            std::cout << "unknown variable: " << sm[1] << '\n';
    }

    std::cout << x << ' ' << y << '\n';
}

Jul 14, 2020 at 7:56pm
Raw string literals can help avoid serious cases of leaning toothpick syndrome:
https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome

const std::regex re(R"eos(([xy])="([0-9]+)")eos"); // internal quotes not escaped

The raw string has dubious value here, but sometimes it offers big improvements.
Last edited on Jul 14, 2020 at 7:57pm
Jul 14, 2020 at 8:25pm
Raw strings were probably added mostly for regexes since otherwise they can get pretty ridiculous. However, for this regex it's simpler with a normal string. The text can benefit a little, though.

The "unknown variable" condition can never occur in the code above since only x and y are ever matched. To match general variable names you might do something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <iostream>
#include <string>
#include <regex>
#include <limits>

const int Unset = std::numeric_limits<int>::min();

int main()
{
    int x = Unset, y = Unset;
    // Variable names start with a letter or underscore.
    // After the first character they can contain letters, underscores, and digits.
    const std::regex re("([A-Za-z_][A-Za-z0-9_]*)=\"([0-9]+)\"");
    std::string text = R"(<Point x="1" k16="2" y="34" />)";

    std::smatch sm;
    for (auto it = text.cbegin();
         std::regex_search(it, text.cend(), sm, re);
         it = sm[0].second)
    {
        if (sm[1] == "x")
            x = stoi(sm[2]);
        else if (sm[1] == "y")
            y = stoi(sm[2]);
        else
            std::cout << "unknown variable: " << sm[1] << '\n';
    }

    if (x != Unset)
        std::cout << "x=" << x << '\n';
    else
        std::cout << "x not found\n";

    if (y != Unset)
        std::cout << "y=" << y << '\n';
    else
        std::cout << "y not found\n";
}

Topic archived. No new replies allowed.