Parsing xml file

Oct 4, 2017 at 6:44am

there is such xml a file

<order>  
<object>  
<name >John Doe</name>
<cost>5000</cost>    
</object>  
<object>  
<name>Tom Hall</name>  
<cost>5000</cost>
</object>
</order>

It is necessary to parse names, my working hours:

size_t found = str.find("<name");
string aux = str.substr(found);
found = aux.find(">");
aux = aux.substr(found + 1);
size_t end_found = aux.find("</name");
string find_str = aux.substr(0, end_found);

But so I get only one name, how would I fixate to get everything?

Oct 4, 2017 at 7:23am

Golden Lizard (310)

Just spare yourself the trouble and use a 3rd party library.
https://stackoverflow.com/questions/9387610/what-xml-parser-should-i-use-in-c

If for some reason you want to write your own parser, the mistake you make is that you only parse the 1st occurence of the "name" tag. You need to loop the contents of the file.

Oct 4, 2017 at 7:46am

coder777 (8449)

Take a look at the boost xml parser:

http://www.boost.org/doc/libs/1_65_1/doc/html/property_tree/reference.html#header.boost.property_tree.xml_parser_hpp

Oct 4, 2017 at 9:42am

Thomas1965 (4571)

CMarkup is another option: http://www.firstobject.com/

Oct 4, 2017 at 12:17pm

jonnin (11494)

if you need a full bore parser, the above are almost required, xml is tricky.

however a dumb "find the tag" parser works great if you just want one or two things specifically. I have one in C , looks roughly like:

read entire file into char array
do
pointer = strstr(tag)
extract tag data as string and print / process it
while pointer not null

and took maybe 5 min to write and does multi GB files faster than I can open the same files in most tools. Its crude, but it works.

Oct 4, 2017 at 12:43pm

lastchance (6980)

Roughly à la @jonnin.

and took maybe 5 min to write

- you are joking!

It's far from perfect - you might have to tidy the xml file first. Alternatively, I could learn regex, I suppose.

input.xml

<order>  
<object>  
<name >John Doe</name>
<cost>5000</cost>    
</object>  
<object>  
<name>Tom Hall</name>  
<cost>5000</cost>
</object>
</order>

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
using namespace std;

// Function prototypes
string getFile( string filename );                         // Reads whole file into a string buffer
vector<string> getData( const string &text, string tag );  // Gets collection of items between given tags
void stripTags( string &text );                            // Strips any tags


//======================================================================


int main()
{
   string filename = "input.xml";
   string tag = "name";
// string tag = "object";
   bool stripOtherTags = true;

   string text = getFile( filename );
   vector<string> all = getData( text, tag );
   for ( string &s : all ) 
   {
      if ( stripOtherTags ) stripTags( s );
      cout << s << '\n';
   }
}


//======================================================================


string getFile( string filename )
{
   string buffer;
   char c;

   ifstream in( filename );   if ( !in ) { cout << filename << " not found";   exit( 1 ); }
   while ( in.get( c ) ) buffer += c;
   in.close();

   return buffer;
}


//======================================================================


vector<string> getData( const string &text, string tag )
{                                                          
   vector<string> collection;
   unsigned int pos = 0, start;

   while ( true )
   {
      start = text.find( "<" + tag, pos );   if ( start == string::npos ) return collection;
      start = text.find( ">" , start );
      start++;

      pos = text.find( "</" + tag, start );   if ( pos == string::npos ) return collection;
      collection.push_back( text.substr( start, pos - start ) );
   }
}


//======================================================================


void stripTags( string &text )
{
   unsigned int start = 0, pos;

   while ( start < text.size() )
   {
      start = text.find( "<", start );    if ( start == string::npos ) break;
      pos   = text.find( ">", start );    if ( pos   == string::npos ) break;
      text.erase( start, pos - start + 1 );
   }
}


//======================================================================

Edit & run on cpp.sh

Output:

John Doe
Tom Hall

Last edited on Oct 4, 2017 at 3:12pm

Oct 4, 2017 at 12:57pm

jonnin (11494)

not joking, but you just wrote a lot more code. Mine really was just 2 loops that found a tag (and its end tag) and a print statement. My xml was bot generated, already tidy (or at least consistent).

I appear to have deleted it. It was to debug a specific problem that we solved long ago.

Last edited on Oct 4, 2017 at 1:23pm

Oct 4, 2017 at 3:03pm

Golden Lizard (310)

Json is much cleaner and easier to read than xml, IMO. Also I feel like xml is verbose. Just my two cents.

Topic archived. No new replies allowed.