I am trying to read in an HTML file and I am trying to strip the tags '<' '>' and everything in between them. Below is some code I've tried working with for awhile and I just can't seem to get it right. It's both a logic and syntax issue I believe. Here's a better view of what I'm trying to do.
Example: Input from file: <html> Hello World! </html>
Output to screen: Hello World!
As you can see the tags were stripped, but just can't get the code below to do it!?
You're not far off, but a few things are holding you back:
Currently, you are adding c onto name, all the time - even when c is an invalid (unwanted) character. Instead, only append c onto name if it is a valid character (i.e. use an else statement after your if (name[i] == '<').
If you do the above, you will never need to query the contents of name, so you'll not need the i variable anymore. Instead, just query the value of c & in your while loop, keep pulling chars off of the inFile until your name[i] != '>' condition fails.
Other than some bomb-proofing in the while loop (to make sure the file doesn't have a missing >), that should just about do it.
#include <iostream>
#include <fstream>
#include <string>
usingnamespace std;
int main()
{
int i = 0;
ifstream inFile;
string name;
string temp;
inFile.open("input.txt");
int counter = 0;
char c;
while (!inFile.eof())
{
inFile.get(c);
if (c == '<')
{
while (c != '>')
{
temp = temp + c; // using a string named temp to hold the garbage symbols that we dont want
}
}
elseif (c != '<')
{
name = name + c;
cout << name;
}
}
system("pause");
return 0;
}