Parsing xml file

there is such xml a file
1
2
3
4
5
6
7
8
9
10
<order>  
<object>  
<name >John Doe</name>
<cost>5000</cost>    
</object>  
<object>  
<name>Tom Hall</name>  
<cost>5000</cost>
</object>
</order>

It is necessary to parse names, my working hours:
1
2
3
4
5
6
size_t found = str.find("<name");
string aux = str.substr(found);
found = aux.find(">");
aux = aux.substr(found + 1);
size_t end_found = aux.find("</name");
string find_str = aux.substr(0, end_found);

But so I get only one name, how would I fixate to get everything?
Just spare yourself the trouble and use a 3rd party library.
https://stackoverflow.com/questions/9387610/what-xml-parser-should-i-use-in-c

If for some reason you want to write your own parser, the mistake you make is that you only parse the 1st occurence of the "name" tag. You need to loop the contents of the file.
CMarkup is another option: http://www.firstobject.com/
if you need a full bore parser, the above are almost required, xml is tricky.

however a dumb "find the tag" parser works great if you just want one or two things specifically. I have one in C , looks roughly like:

read entire file into char array
do
pointer = strstr(tag)
extract tag data as string and print / process it
while pointer not null

and took maybe 5 min to write and does multi GB files faster than I can open the same files in most tools. Its crude, but it works.

Roughly à la @jonnin.
and took maybe 5 min to write
- you are joking!


It's far from perfect - you might have to tidy the xml file first. Alternatively, I could learn regex, I suppose.

input.xml
<order>  
<object>  
<name >John Doe</name>
<cost>5000</cost>    
</object>  
<object>  
<name>Tom Hall</name>  
<cost>5000</cost>
</object>
</order>


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
using namespace std;

// Function prototypes
string getFile( string filename );                         // Reads whole file into a string buffer
vector<string> getData( const string &text, string tag );  // Gets collection of items between given tags
void stripTags( string &text );                            // Strips any tags


//======================================================================


int main()
{
   string filename = "input.xml";
   string tag = "name";
// string tag = "object";
   bool stripOtherTags = true;

   string text = getFile( filename );
   vector<string> all = getData( text, tag );
   for ( string &s : all ) 
   {
      if ( stripOtherTags ) stripTags( s );
      cout << s << '\n';
   }
}


//======================================================================


string getFile( string filename )
{
   string buffer;
   char c;

   ifstream in( filename );   if ( !in ) { cout << filename << " not found";   exit( 1 ); }
   while ( in.get( c ) ) buffer += c;
   in.close();

   return buffer;
}


//======================================================================


vector<string> getData( const string &text, string tag )
{                                                          
   vector<string> collection;
   unsigned int pos = 0, start;

   while ( true )
   {
      start = text.find( "<" + tag, pos );   if ( start == string::npos ) return collection;
      start = text.find( ">" , start );
      start++;

      pos = text.find( "</" + tag, start );   if ( pos == string::npos ) return collection;
      collection.push_back( text.substr( start, pos - start ) );
   }
}


//======================================================================


void stripTags( string &text )
{
   unsigned int start = 0, pos;

   while ( start < text.size() )
   {
      start = text.find( "<", start );    if ( start == string::npos ) break;
      pos   = text.find( ">", start );    if ( pos   == string::npos ) break;
      text.erase( start, pos - start + 1 );
   }
}


//====================================================================== 


Output:
John Doe
Tom Hall
Last edited on
not joking, but you just wrote a lot more code. Mine really was just 2 loops that found a tag (and its end tag) and a print statement. My xml was bot generated, already tidy (or at least consistent).

I appear to have deleted it. It was to debug a specific problem that we solved long ago.
Last edited on
Json is much cleaner and easier to read than xml, IMO. Also I feel like xml is verbose. Just my two cents.
Topic archived. No new replies allowed.