• Forum
  • Lounge
  • Best way to parse lines in standard C++?

 
Best way to parse lines in standard C++?

Pages: 12
Mar 25, 2012 at 11:45am
I'm working as an assistant in a research project at the moment. I'm supposed to write some kind of simplistic testing system that takes a list of actions and checks if that sequence is possible (and how likely it is, but that's besides the point).

Basically, due to my employers using a variety of systems and compilers, I am restricted to C++03 and no third party libraries (yeah, no boost either).

Since I'm stuck with the standard library, what is the best way to parse lines in standard C++? So far it seems to me that just using std::string (and stringstream for datatype conversions) will get me the farthest - it seems to have most of what I need (not all I want, but oh well...), but I might be wrong.
Mar 25, 2012 at 12:06pm
See my example in this thread:
http://cplusplus.com/forum/beginner/65178/

Basically, getline does a great job of tokenizing a stream (you can specify a delimiter). A stringstream can be created from a string to enable this approach. That is the most straightforward way in C++03 that I have seen.

Things can get more difficult if you have to watch out for multiple delimiter characters at once or if the delimiters are more than 1 character. In that case, I think the strings find_* methods can do the trick.
Mar 25, 2012 at 12:57pm
I have to deal with optional parameters though, so getline isn't really an option. Thanks anyways, I'll keep that in mind (I'm hoping to make the optional parameters non-optional later on, because there isn't really any reason for them to be optional other than the fact that my employers wanted to see a prototype really quick cause they weren't sure I could do it).
Mar 25, 2012 at 1:12pm
Mar 25, 2012 at 1:16pm
closed account (S6k9GNh0)
I find Flex/Bison amazingly simple once you get the hang of it. I also found it easier to work with than most other lexers/parsers.

Flex/Bison don't use third-party libraries either. You use the tools to generate native C or C++ (depending on mode).
Mar 25, 2012 at 1:41pm
Yeah, but that's just way overkill (I think) - there's really just one possible type of line with slight variations for now. It's not really that hard to write, I was just wondering because I don't know how much more they want (and they apparently don't either, at least not at the moment).

I think I'll go with what I have for now, and come back to it later should the need arise.
Mar 25, 2012 at 7:23pm
I also suggest looking into regular expressions. They can be a bit difficult to learn but can really expedite your coding process. Especially, if you have strings involved.

I.E. cout << "this is \"some\" string";

It can be more complicated but just as an example.
Mar 25, 2012 at 8:40pm
I also suggest looking into regular expressions.


I need at least tr1 to use regular expressions. Or boost. Or something in that direction. None of that is really an option for me.
Mar 25, 2012 at 9:53pm
I would like to see examples of what you want to parse. I don't understand why getline is unacceptable. I'm willing to take the challenge. :P
Mar 25, 2012 at 11:10pm
I don't think it's unacceptable. just impractical.


1234: (hello)
(hello) [24]
1234: (hello) [24]
1234: (hello) [24] (anotherhello) [24] (yetanotherhello) [24]


are all valid. I just don't see how I would effectively account for optional values with getline.
Mar 26, 2012 at 12:20am
By reading an entire line and then using a split function (which can use getline to do its job).
Mar 26, 2012 at 2:47am
closed account (S6k9GNh0)
hanst99, that's part of parsing... If something matches a pattern, categorize it, else feed more input. If nothing matches a pattern, it's invalid input. Then using those categorized strings, you can define grammar. The grammar can define optional input quite easily and is very common.

I still think Flex/Bison would be a good bet here. I see people use it for general things like parsing the command line.
Last edited on Mar 26, 2012 at 2:50am
Mar 26, 2012 at 3:52am
Both the getline() and find_first_of() split algorithms I linked to you will do what you want easily.
Mar 26, 2012 at 2:48pm
I am using find_first_of (and find_first_not_of).

Of course I'm using getline to get the lines, but I don't see how I would use getline to extract data - all I can do is pass a delimiter that may or may not be there, which doesn't seem that terribly useful to me (am I missing something?).

I am going to look into flex, and maybe into bison too, I just didn't want to waste time learning how to use a new tool (two in this case) before starting to work on it - especially since this is a really simple grammar, and doesn't really warrant such effort. I just wanted to know if there's something I missed (like, there being other useful functions other than getline or the string methods).

on a note to flex... I am trying to find some material on it, this (http://flex.sourceforge.net/manual/Simple-Examples.html#Simple-Examples ) is the first I found - it seems to use what people used to think C looks like. Is that something flex-specific, or are you supposed to write a C program and that text is just really, really old?
Last edited on Mar 26, 2012 at 2:52pm
Mar 26, 2012 at 3:43pm
Why hasn't anyone mentioned std::regex? Even if g++ doesn't yet support it, g++ isn't the only compiler out there. :/

Whoops.

-Albatross
Last edited on Mar 26, 2012 at 4:03pm
Mar 26, 2012 at 3:49pm
Why hasn't anyone mentioned std::regex?



Basically, due to my employers using a variety of systems and compilers, I am restricted to C++03 and no third party libraries (yeah, no boost either).


I hope that's why.
Mar 26, 2012 at 4:16pm
...whoops. I didn't see that. :/

In that case, yes, I'd suggest flexcpp, which as the name implies generates C++ code. You may also want bison if you think you'll need it, but flex(cpp) should be enough. :)

it seems to use what people used to think C looks like. Is that something flex-specific, or are you supposed to write a C program and that text is just really, really old?


Eh, well. That example seems to have a few errors in it. Flex expects three sections, one containing the data that will be prepended to the generated C/C++ file, one containing the regular expressions and rules explaining what to do with them, and one that will be copied into the generated C/C++ file.

Flexcpp documentation (a lot of which can also be applied to the C-generating flex): http://flexcpp.org/documentation/manual/flexc++02.html

-Albatross
Last edited on Mar 26, 2012 at 4:18pm
Mar 26, 2012 at 5:51pm
He also said no third party libraries, Alby :P
Mar 26, 2012 at 6:45pm
Flex(cpp) isn't a library, rather it's a code generator, xandy. :P

EDIT: Typoes.

-Albatross
Last edited on Mar 26, 2012 at 6:45pm
Mar 27, 2012 at 1:17pm
BisonC++'s generated code seems to be under GPL - I assume this applies to FlexC++ as well. I remember there being a discussion about this rather recently, but I don't remember the details anymore - how are you allowed to use GPL code? I'm pretty sure you can't statically link to programs that aren't "GPL compatible" (whatever that means), but what about dynamic linking?

I personally am not planning on creating proprietary software anytime soon, but I'm not sure how the rest of the project I'm working on is licensed - and I certainly don't want to be responsible for importing unnecessary or even harmful license restrictions.
Pages: 12