...

how will you know the diff between xml and html?
or is this a poorly phrased assignment where you destroy <everything in these >?
if its just a <> tag stripper just copy from one string to the other under conditions. You need an increment for nested tags, probably, but not much else. What assumptions can you make -- nested tags handled -- are spaces allowed inside tags -- what about c++ code eg cout << derp; cin >> derp; what would that do to it?



----------------
I was tasked to remove all html tags from my string value. If I input
<html><p>Test</html></p> my results are <p>Test</p>. I am stuck on how to remove the rest of the tags. This is what I have so far.

#include<iostream>
#include<string>

using namespace std;




int main()
{
string name;
cout << "Enter a textual value: "; // Enter your html tags here
getline(cin, name);


int a = 0, b = 0;


for (int a = b; a < name.length(); a++)
{
if (name[a] == '<')
{
for (int b = a; b < name.length(); b++)
{
if (name[b] == '>')
{
name.erase(a, (b-a+1));
break;
}
}
}
}


cout << "Updated String Value: " << name << endl; // Will output what your new text will be


return 0;
}




This is the instructions I have for it.

Example: I enter: <html><p> Hello </p> There </ html>
Output to screen: Hello There
There should not be any html tags in your output. The user is free to enter anything he/she wants, so it is not limited to the above example. So, your program needs to be able to remove any number of html tags entered.
Functions: I am expecting to see 3 functions in your program:
Get the string value from the user via the keyboard
Process the string value where you remove all html tags
Output the updated string to the screen
Last edited on
oh, ok that isnt so bad.
just say
for all the letters
copy to a new string if not inside tags.
and keep track of when you are inside tags somehow. ... if letter == '<' inside = true. if letter == '>' inside = false. This will break if you need to do nested tags though, or if you need to allow the angle bracket characters that are not 'tags' ... is that possible?

these kinds of problems can be either easy or exceedingly hard. If you need to get fancy you end up with a stack & token type parser, which is kind of advanced ideas, or a library that parses html or xml and tokenizes it for you, or regex code. There is sort of a stop-and-rethink-it when the simple approaches don't work you need to stop and study the problem in great detail before proceeding.
Last edited on
The loop should actually be pretty simple:

1
2
3
4
5
while (not done)
{
  while (not done) and (character != '<') copy it to the result string;
  while (not done) and (character != '>') ignore it;
}
and bang - the OP has disappeared in a puff of smoke......
Topic archived. No new replies allowed.