String Parsing using known delimiters? [C++]

Jul 8, 2009 at 5:28am
The age old issue of string parsing comes up again ...
I have a text file that contains lines that are SUPPOSED to follow a set format, specifically:
string, string, long string int string double int

The delimiters are therefore:
Comma (,) for the first two fields
Spaces for all other fields

Strings like this would be valid:
Jon, Jack, 100 CPN 5 KTE 1.00 10
Jon, Jack 100 CPN 5 KTE 1.00 10 // notice the extra spaces

Whereas something like these would be considered invalid:
Jon Jack 100 CPN 5 KTE 1.00 10 // missing the commas
Jon, Jack, 100 CPN 5 KTE 1.00 // missing the last field "10"
Jon, Jack, 100CPN 5 KTE 1.00 10 // missing space between "100" and "CPN"

The goal is to EXTRACT each section and store them, and if possible determine when a string is INVALID (does not follow format).
I have a class with the following data members:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class A
{
private:
	// Record
	string A
	string B
	long C;
	string D;
	string E;
	string F;
	double G;
	int H;

public:
	A(string sLine);	// constructor
};

A::A(string sLine)
{
	// somehow parse the string here and determine if it is valid //
}



So, how can I parse the string (sLine) and extract each piece into there components (A, B, C, D, E, F, G, H)...
I was thinking of using the old method of simply doing substring searches but I find it very error prone and long ... is there a better way to accomplish this?

Anything anyone would recommend?
Any help would be much appreciated...
Thanks,
Jul 8, 2009 at 6:07am
closed account (S6k9GNh0)
Use get() and set the delimiter to whatever you want.
Last edited on Jul 8, 2009 at 6:08am
Jul 8, 2009 at 11:28am
Hi...

Use strtok function!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <string.h>
#include <stdio.h>

char string[] = "A string\tof ,,tokens\nand some  more tokens";
char seps[]   = " ,\t\n";
char *token;

void main( void )
{
   printf( "%s\n\nTokens:\n", string );
   /* Establish string and get the first token: */
   token = strtok( string, seps );
   while( token != NULL )
   {
      /* While there are tokens in "string" */
      printf( " %s\n", token );
      /* Get next token: */
      token = strtok( NULL, seps );
   }
}
Jul 8, 2009 at 12:03pm
For this I would use a regular expression (boost::regex). It will do what you want without having to write any parsing code, and the regular expression to capture your format is trivial.

Topic archived. No new replies allowed.