Oct 4, 2017 at 12:17pm UTC
if you need a full bore parser, the above are almost required, xml is tricky.
however a dumb "find the tag" parser works great if you just want one or two things specifically. I have one in C , looks roughly like:
read entire file into char array
do
pointer = strstr(tag)
extract tag data as string and print / process it
while pointer not null
and took maybe 5 min to write and does multi GB files faster than I can open the same files in most tools. Its crude, but it works.
Oct 4, 2017 at 12:43pm UTC
Roughly à la @jonnin.
and took maybe 5 min to write
- you are joking!
It's far from perfect - you might have to tidy the xml file first. Alternatively, I could learn regex, I suppose.
input.xml
<order>
<object>
<name >John Doe</name>
<cost>5000</cost>
</object>
<object>
<name>Tom Hall</name>
<cost>5000</cost>
</object>
</order>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
using namespace std;
// Function prototypes
string getFile( string filename ); // Reads whole file into a string buffer
vector<string> getData( const string &text, string tag ); // Gets collection of items between given tags
void stripTags( string &text ); // Strips any tags
//======================================================================
int main()
{
string filename = "input.xml" ;
string tag = "name" ;
// string tag = "object";
bool stripOtherTags = true ;
string text = getFile( filename );
vector<string> all = getData( text, tag );
for ( string &s : all )
{
if ( stripOtherTags ) stripTags( s );
cout << s << '\n' ;
}
}
//======================================================================
string getFile( string filename )
{
string buffer;
char c;
ifstream in( filename ); if ( !in ) { cout << filename << " not found" ; exit( 1 ); }
while ( in.get( c ) ) buffer += c;
in.close();
return buffer;
}
//======================================================================
vector<string> getData( const string &text, string tag )
{
vector<string> collection;
unsigned int pos = 0, start;
while ( true )
{
start = text.find( "<" + tag, pos ); if ( start == string::npos ) return collection;
start = text.find( ">" , start );
start++;
pos = text.find( "</" + tag, start ); if ( pos == string::npos ) return collection;
collection.push_back( text.substr( start, pos - start ) );
}
}
//======================================================================
void stripTags( string &text )
{
unsigned int start = 0, pos;
while ( start < text.size() )
{
start = text.find( "<" , start ); if ( start == string::npos ) break ;
pos = text.find( ">" , start ); if ( pos == string::npos ) break ;
text.erase( start, pos - start + 1 );
}
}
//======================================================================
Output:
Last edited on Oct 4, 2017 at 3:12pm UTC
Oct 4, 2017 at 12:57pm UTC
not joking, but you just wrote a lot more code. Mine really was just 2 loops that found a tag (and its end tag) and a print statement. My xml was bot generated, already tidy (or at least consistent).
I appear to have deleted it. It was to debug a specific problem that we solved long ago.
Last edited on Oct 4, 2017 at 1:23pm UTC
Oct 4, 2017 at 3:03pm UTC
Json is much cleaner and easier to read than xml, IMO. Also I feel like xml is verbose. Just my two cents.