Jul 31, 2014 at 7:52pm UTC
Hello guys im trying to parse a text file that contains information and i cant seem to get started, does anyone have any good ideas?
For example, the text file looks like this:
idx=929392, pl= 12, name= someperson
age=19
state=ma
status=n/a
idx=929393, pl= 12, name= someperson2
age=20
state=ma
status=n/a
idx=929394, pl= 12, name= someperson3
age=21
state=ma
status=n/a
i want to parse the name and age into another text file like this format:
someperson 19
someperson 20
someperson 21
possibly include other attributes next to the age like idx?
Thanks for the help in advance!
Aug 1, 2014 at 8:43am UTC
Let's look at a regular expression to match the name. It must have a space and lower case letters and numbers. That is:
Age is similar.
If the names are in a file, x.txt, we can do:
cat x.txt | egrep "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*"
That generates:
name= someperson
age=19
name= someperson2
age=20
name= someperson3
age=21
We seperate the field name from value by splitting = with cut:
So applying that we get:
cat x.txt | egrep "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*" | cut -d = -f 2
That gives us:
someperson
19
someperson2
20
someperson3
21
We need to make up a line from the pairs. Enter AWK:
1 2 3 4 5 6 7 8 9
{
if (i%2 == 1)
printf("%s %s\n" , name, $1);
if (i%2 == 0)
name = $1;
++i;
}
Putting it all together:
cat x.txt | egrep -o "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*" | cut -d = -f 2 | awk '{ if (i%2 == 1) printf("%s %s\n", name, $1); if (i%2 == 0) name=$1; ++i }'
Output:
someperson 19
someperson2 20
someperson3 21
Last edited on Aug 1, 2014 at 8:54am UTC