XML parser

Hello!I have been given the task to implement an XML parser from an .xml file using only object-oriented programming with c++.Problem is ,this is the first ever project we have been given,till now we only did tasks where we had clear instructions and honestly I am looking more for how to break the project into smaller pieces.For now I have a class which represents the file itself and has one member Element root,which is another class for implementing the elements of an xml file.The members of Element are string tag_name,string id,string text,map<string,string> attributes,map<string, Element*> //these are the children,all elements must have an id.I don't know how to approach the whole construction of the file.Thank you in advance!

jonnin (11497)

well, lets work on the requirements.
what is the purpose of the parser? I have a dozen crude ones that do little more than find pairs of tags and zap/modify/etc what is between them. A full on XML parser that understands recursion / repeated tag groups is significantly more work. Parsing is not a generic task that exists in a vacuum it is a tool that exists to accomplish something .... that is the first thing you need to know (or decide, or invent for yourself, etc).
it could be as simple as breaking it into a pretty-print format.

gabrinka (15)

well first I have to build objects which represent the file then I have to check if every element has an id and if it is unique.If not I must generate a new unique id which is used for a method to access a certain element by its id.I must be able to set values to attributes using this id,accessing them, add new child to a certain element.All in all I must implement functions that give me the oppurtunity to access everything even the n-th child of a certain parent.

jonnin (11497)

so there you have the high level needs..
you need a way to break it into elements and store those in an usable fashion.

you need to be able to insert into (possibly deletes??) and modify the above. That suggests a list type data structure, possibly a N-tree if you want it to be organized by parents and children.
eg if a parent had 4 children, it would look like

1
2
3

parent
 //  \\
c c c c  where c = child

you need a way to check the IDS and fix them.

and unstated, but you need a way to turn your container back into a legal xml file.
xml often has an XSD with it. Do you need to handle that?

Last edited on

gabrinka (15)

No I dont need to do that. I am thinking about passing the file as a string at least thats what my assistant said to do before I start using files. So I pass the file as a whole string to the constructor in order to build the tree . But I dont know about that ID, where should I store it so that I can check if it is a unique value.

jonnin (11497)

where? Just store it in a local variable when you check it.
iterate the data, look for the IDS, and when you find one, store it.
after you stored them all, check for duplicates.
if the ID is simple, like 0-N where N is relatively small, a modified bucket sort makes this really simple to handle:
vector<unsigned char> bucket(N,0); //if you think you may have more than 255 duplicates, you can use int instead of char.
for(all the file)
{
bucket[ID] ++;
if(bucket[ID] > 1) ... you have a duplicate, handle it. You can do that here, or later. if you handle it here, you should set the bucket back to 1, perhaps... handling it right upon detection, while you still have it "in hand" inside the file data, makes sense. Do not forget if you modify one here, to set bucket[new ID] to 1.
}

if N is unknown or of an unwieldy format, you can just shove all the IDS into a vector, ideally with their string position for modification, sort the thing by IDS, iterate looking for 2+ that are the same in a row, and fix them that way.

once it is all fixed, you can destroy your list of IDs etc, you don't need them anymore. the program flow logic for this idea is
read the file
fix the IDs
build the tree, possibly using ID as a key to make a "sort of" search-tree logic, (you can still use less left greater right type search, its just that you have to iterate the list of children at each level instead of just left and right). Or you can be snarky and keep a side-vector (or map, or whatever) that has the ID and a pointer to the record, so you can just jump there in the tree directly. This wastes space but is really nice to have!

Last edited on

doug4 (1538)

I think you should FOR THE TIME BEING ignore the id field. To get started, simply ignore all attributes. Make sure you can parse the document (tags and values) into a tree like @jonnin suggested.

You should have a class named Element or Node or something that contains the tag and the value. And the value should be a list of some sort (vector maybe?) of child elements. And you need to account for text as a value. As things get more complicated, you will need to account for text between child tags, but you can get things working before you add that.

After you are able to read in and write out an XML document with all of the tags/values (and properly indented to show child levels), then start playing with attributes. That is where id will live. Simply make id a member of the Element class, and figure out how to populate it.

However, do id later. Get the guts of the XML parser and writer working FIRST. Do things one step at a time and get things working before moving on to the next step.

Topic archived. No new replies allowed.