parse a text file

Forum

Forum
General C++ Programming
parse a text file

Jul 31, 2014 at 7:52pm

Hello guys im trying to parse a text file that contains information and i cant seem to get started, does anyone have any good ideas?

For example, the text file looks like this:

idx=929392, pl= 12, name= someperson
age=19
state=ma
status=n/a

idx=929393, pl= 12, name= someperson2
age=20
state=ma
status=n/a

idx=929394, pl= 12, name= someperson3
age=21
state=ma
status=n/a

i want to parse the name and age into another text file like this format:

someperson 19
someperson 20
someperson 21

possibly include other attributes next to the age like idx?
Thanks for the help in advance!

Aug 1, 2014 at 8:43am

kbw (9488)

Let's look at a regular expression to match the name. It must have a space and lower case letters and numbers. That is:

[\ ,a-z,0-9]

Age is similar.

If the names are in a file, x.txt, we can do:

cat x.txt | egrep "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*"

That generates:

name= someperson
age=19
name= someperson2
age=20
name= someperson3
age=21

We seperate the field name from value by splitting = with cut:

-d = -f 2

So applying that we get:

cat x.txt | egrep "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*" | cut -d = -f 2

That gives us:

 someperson
19
 someperson2
20
 someperson3
21

We need to make up a line from the pairs. Enter AWK:

{
    if (i%2 == 1)
        printf("%s %s\n", name, $1);

    if (i%2 == 0)
        name = $1;

    ++i;
}

Putting it all together:

cat x.txt | egrep -o "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*" | cut -d = -f 2 | awk '{ if (i%2 == 1) printf("%s %s\n", name, $1); if (i%2 == 0) name=$1; ++i }'

Output:

someperson 19
someperson2 20
someperson3 21

Last edited on Aug 1, 2014 at 8:54am

Aug 1, 2014 at 10:31am

q139 (22)

http://anaturb.net/C/string_exapm.htm
This page has good examples.

Aug 1, 2014 at 11:04am

keskiverto (10425)

Call the cat home and raise:

sed -n '/name=/{ s/\(idx=[0-9]\+\).*\(name= [a-z0-9]\+\)/\2 \1/; H }; /age=/H; ${g; s/^\n//; s/name= //g; s/idx=\([0-9]\+\)\nage=\([0-9]\+\)/\2 \1/g; p}' x.txt

someperson 19 929392
someperson2 20 929393
someperson3 21 929394

Luckily, that program can be saved as a file:

# x.sed
/name=/{
  s/\(idx=[0-9]\+\).*\(name= [a-z0-9]\+\)/\2 \1/
  H
}

/age=/H

${
  g
  s/^\n//
  s/name= //g
  s/idx=\([0-9]\+\)\nage=\([0-9]\+\)/\2 \1/g
  p
}

# sed -n -f x.sed x.txt 
someperson 19 929392
someperson2 20 929393
someperson3 21 929394

The functionality obviously depends on the input having both a line with idx and name and a line with age for each "record".

Topic archived. No new replies allowed.