parsing string with sscanf

Dec 16, 2012 at 7:51pm
Hi everyone,
I'm trying to parse a document. This has been working perfectly for me. Until now :p Because now the documents I want to parse are getting more complex. For example:

A line of the document could be something like this:

"f 1/2/3 2/3/4"

and then this would work:

sscanf(line.c_str(),"%*s %d/%d/%d %d/%d/%d", &i1, &i2, &i3, &i4, &i5, &i6);

But now I have this line:

"f 1//2 2/3/"

You can see that some values aren't filled. The previous code will no longer work because sscanf will stop because it wants to read and integer (%d) where it will find a "/". Does anyone now how I need to do this?

Greetings
genzm
Last edited on Dec 16, 2012 at 7:52pm
Dec 16, 2012 at 7:55pm
does it have to be done in C, or can you use C++ with its streams, tokenizers, regular expressions, or even complete parsers (such as boost.spirit)
Last edited on Dec 16, 2012 at 7:56pm
Dec 16, 2012 at 7:57pm
It is in c++.
Could you maybe give me an example of how this would work? Or a link to a clear explenation?
Dec 16, 2012 at 8:06pm
the strtok function seems to do exactly the trick I want :p
Thanks for the tip.
Dec 16, 2012 at 9:05pm
strtok is a pretty bad approach.. The appropriate solution depends on the expected result of the parse: are you constructing an object? populating a struct? populating a vector<int>? What happens when you've read 4 out of 6 numbers: are you just building a vector of four ints, or are you indicating that something wasn't provided? In short, that's not enough information.
Dec 17, 2012 at 9:32am
Indeed I've run into a problem with strtok. So I'll give you some more information on what I'm trying to do:
I'm working on an parser that reads in data from an .obj file (3D model).
The structure of this file looks like this:
f a/b/c d/e/f g/h/i        // for triangles
f a/b/c d/e/f g/h/i j/k/l  // for quads


The letters refer to the index of vertices, texture coordinates and normals. The parsing of the vertices, textures and normals works perfectly. The problem with the faces is that not all data is always included:
For example: not all files contain normals data. A line would like like this:
f a/b/ d/e/ g/h/      // for triangles
f a/b/ d/e/ g/h/ j/k/ // for quads


or not all faces have texture coordinates:
f a//c d//f g//i      // for triangles
f a//c d//f g//i j//l // for quads


or the file might not contain any data;
f a d g   // for triangles
f a d g j // for quads


Or the file might contain any of the previous lines combined. All this variation makes it difficult to parse it properly.

I have a struct called "face" which looks like this:
1
2
3
4
5
6
typedef struct {
    int i1,i2,i3;
    int n1,n2,n3;
    int m1,m2,m3;
    int t1,t2,t3;
} face;


And of course I'm loading in all the data from the file into this struct. What I want to achieve is that all values which aren't included in the file (like texture coordinates or normals) should be -1.

Any suggestions on how I can do this?
Dec 17, 2012 at 10:57am
Dec 17, 2012 at 6:01pm
Any suggestions on how I can do this?

A manual parse, without using any libraries, would look something long and boring like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <tuple>

struct face {
    int i1,i2,i3;
    int n1,n2,n3;
    int m1,m2,m3;
    int t1,t2,t3;
};

std::tuple<int, int, int> parse_three_ints(const std::string& s)
{
    std::istringstream buf(s);
    int r1 = -1, r2 = -1, r3 = -1;
    std::string token;
    if(getline(buf, token, '/') && !token.empty())
        r1 = stoi(token);
    if(getline(buf, token, '/') && !token.empty())
        r2 = stoi(token);
    if(getline(buf, token, '/') && !token.empty())
        r3 = stoi(token);
    return std::make_tuple(r1, r2, r3);
}

int main()
{
    std::istringstream input("f 1/2/3 4/5/6 7/8/9\n"
                             "f 10/11/12 13/14/15 16/17/18 19/20/21\n"
                             "f 22/23/ 24/25/ 26/27/\n"
                             "f 28/29/ 30/31/ 32/33/ 34/35/\n"
                             "f 36//37 38//39 40//41\n"
                             "f 42//43 44//45 46//47 48//49\n"
                             "f 50 51 52\n"
                             "f 53 54 55 56\n");
    std::vector<face> result;
    std::string line;
    while(getline(input, line)) // process line by line
    {
        std::istringstream buf(line);
        std::string word;
        buf >> word;
        if(word != "f")
        {
            std::cout << "Parse error, line begins with " << word << '\n';
            break;
        }
        // prepare the new face
        face f;
        buf >> word;
        std::tie(f.i1, f.i2, f.i3) = parse_three_ints(word);
        buf >> word;
        std::tie(f.n1, f.n2, f.n3) = parse_three_ints(word);
        buf >> word;
        std::tie(f.m1, f.m2, f.m3) = parse_three_ints(word);
        buf >> word;
        std::tie(f.t1, f.t2, f.t3) = parse_three_ints(word);
        result.push_back(f);
    }

    // output
    for(face& f: result)
        std::cout << "{ " << f.i1 << ',' << f.i2 << ',' << f.i3 << '\n'
                  << "  " << f.n1 << ',' << f.n2 << ',' << f.n3 << '\n'
                  << "  " << f.m1 << ',' << f.m2 << ',' << f.m3 << '\n'
                  << "  " << f.t1 << ',' << f.t2 << ',' << f.t3 << " }\n";
}

online demo: http://liveworkspace.org/code/3975xd

But we have boost.spirit for this kinda thing (which is much faster, too)

It can be done prettier, that BNF is worth structuring, but here's my first attempt that works for this test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <iostream>

#define FUSION_MAX_VECTOR_SIZE 12
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_int.hpp>
#include <boost/spirit/include/qi_no_skip.hpp>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

struct face {
    int i1,i2,i3;
    int n1,n2,n3;
    int m1,m2,m3;
    int t1,t2,t3;
};

BOOST_FUSION_ADAPT_STRUCT(
    face,
    (int, i1) (int, i2) (int, i3)
    (int, n1) (int, n2) (int, n3)
    (int, m1) (int, m2) (int, m3)
    (int, t1) (int, t2) (int, t3)
)

// have to use this macro instead of the regular auto for named micro-parsers, until Spirit V3
#define BOOST_SPIRIT_AUTO(domain_, name, expr)                                  \
    typedef BOOST_TYPEOF(expr) name##expr_type;                                 \
    BOOST_SPIRIT_ASSERT_MATCH(boost::spirit::domain_::domain, name##expr_type); \
    BOOST_AUTO(name, boost::proto::deep_copy(expr));                            \

int main()
{
    std::string input("f 1/2/3 4/5/6 7/8/9\n"
                      "f 10/11/12 13/14/15 16/17/18 19/20/21\n"
                      "f 22/23/ 24/25/ 26/27/\n"
                      "f 28/29/ 30/31/ 32/33/ 34/35/\n"
                      "f 36//37 38//39 40//41\n"
                      "f 42//43 44//45 46//47 48//49\n"
                      "f 50 51 52\n"
                      "f 53 54 55 56\n");
    std::vector<face> result;

    BOOST_SPIRIT_AUTO(qi, optint, qi::no_skip[qi::int_] | qi::attr(-1));
    BOOST_SPIRIT_AUTO(qi, triple, qi::int_ >> ( ('/' >> optint >> '/') | qi::attr(-1) ) >> optint);

    qi::phrase_parse(input.begin(), input.end(),
                     *(   ('f' >> triple >> triple >> triple >> triple )
                        | ('f' >> triple >> triple >> triple >> qi::attr(-1) >> qi::attr(-1) >> qi::attr(-1) )
                      ),
                     ascii::space, result );

    for(face& f: result)
    std::cout << "{ " << f.i1 << ',' << f.i2 << ',' << f.i3 << '\n'
              << "  " << f.n1 << ',' << f.n2 << ',' << f.n3 << '\n'
              << "  " << f.m1 << ',' << f.m2 << ',' << f.m3 << '\n'
              << "  " << f.t1 << ',' << f.t2 << ',' << f.t3 << " }\n";
}

Last edited on Dec 18, 2012 at 11:26am
Topic archived. No new replies allowed.