Open and Analyze a HTML Document

Hi, so I am really having a difficult time with an assignment. We are supposed to use loops to open a HTML document where the user must input the name and then the program displays the text(code) of that file. Finally, we must analyze the document and output the number of tags, links, etc. in the HTML document. Since this is my first computer programming course, I am really lost on this assignment and any help would be really great!

Here's what the output program should look like:

======================================================================
HTML File Analyzer
======================================================================

Enter a valid filename (no blanks!): fred
Re-enter a valid filename: sue
Re-enter a valid filename: test.html

======================================================================
Text of the file
======================================================================
<html>
<TITLE>Course Web Page</TITLE>
This course is about programming
in C++.

<li><a href="http://www.cs.fsu.edu/somefile.html">Click here</a>

<!-- almost done! -->
</html>

======================================================================
End of the text
======================================================================

Analysis of file
----------------

Number of lines: 9
Number of tags: 8
Number of comments: 1
Number of links: 1
Number of chars in file: 176
Number of chars in tags: 87
Percentage of characters in tags: 49.43%


======================================================================
Thanks for using the HTML File Analyzer
======================================================================

Press any key to continue . . .
Do you already know how to use std::fstream? Loading the file line by line and displaying it should be easy by using std::getline.
http://www.cplusplus.com/reference/string/string/getline/

Basically:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
std::fstream inFile("file.blah.blah.txt");

//---- edit: added from OP post below
while (!inFile)
{
    inFile.clear();
    cout << endl << "Re-enter a valid filename: " << endl;
    cin >> fileName;
    inFile.open (fileName.c_str());
}
//---- end edit

std::string line;
while(inFile)
{
    std::getline(inFile, line);
    //do stuff with line
    //1. print the line
    std::cout << line << std::endl;
    //2. analyze the html (character by character)
    for(size_t i = 0; i < line.size(); i++)
    {
        //do something with each character
        //(hint: check for patterns <, >, <!, etc.)
    }
}


Percentage of characters in tags

Seeing this percentage thingy, is why I think you need to go character by character.
Last edited on
We're screwed
If you start your program and show some effort, people are willing to help you on this forum.
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main ();
{
string fileName;

cout << "========================================" << endl;
cout << "HTML File Analyzer" << endl;
cout << "========================================" << endl << endl;

cout << "Enter a valid filename (no blanks!): ";
cin >> fileName;

inFile.open (fileName.c_str());

while (!inFile)
{
inFile.clear();
cout << endl << "Re-enter a valid filename: " << endl;
cin >> fileName;
inFile.open (fileName.c_str());
}

cout << "========================================" << endl;
cout << "Text of the File" << endl;
cout << "========================================" << endl;
Sorry, I've been trying to figure this all out. And we have learned some about getline, but I don't think we learned about std::getline. I'll definitely check out that reference page though. But so far I have the beginning of the program done and now I just have to print the contents of the file (which does need to be character by character, I learned) and then analyze the file...
You can probably print it anyway you see fit, unless your assignment dictates otherwise. The character by character is most likely just for analyzing the file. You can use that while loop I put in my other post to print the file line by line.
Okay, well I'm not quite sure how to do the part where it says the Number of Lines. Can you increment that or no?
That should be as simple as incrementing an integer variable each time the loop runs to read each line. Using the code kevinkjt2000 posted earlier, it would just be something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int num=0;

while(inFile)
{
    std::getline(inFile, line);
    //do stuff with line
    //1. print the line
    std::cout << line << std::endl;
    num++; //add 1 to num after each line is read
    //2. analyze the html (character by character)
    for(size_t i = 0; i < line.size(); i++)
    {
        //do something with each character
        //(hint: check for patterns <, >, <!, etc.)
    }
}

cout<<"Number of lines: "<<num;


Also, if you use using namespace std;

you won't have to put std:: in front of getline if that confuses you.
Last edited on
It may help to write some steps down on a piece of paper and then separate those into tiny pieces that you know how to program. Then just piece the pieces together. Most programmers will tackle problems much better if the pieces are smaller and more understandable. So dealing with the number of lines should be as simple as counting how many times a line is dealt with like unsensible has said.
Alright, so there are probably a lot of things that I did wrong in this program. Also, it says that there are many errors and almost all of them are on the couts?? And I haven't tried to add in the parts where you calculate the lines yet...


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main ();
{
	const char TAG = '<',		//represents the beginning of a tag
			   LINK = 'a',		//represents the beginning of a link
			   COMMENT = '!';	//represents the beginning of a comment
	char fileChar;				//individual characters from the file
	int totalChars = 0,			//total characters in the file
		totalTags = 0,			//total tags in the file
		totalLinks = 0,			//total links in the file
		totalComments = 0;		//total comments in the file
	string fileName;			//the filename that is input from the user
	ifstream inFile;			

	cout << "========================================" << endl;		//print an overall title
	cout << "HTML File Analyzer" << endl;
	cout << "========================================" << endl << endl;

	cout << "Enter a valid filename (no blanks!): ";				//ask the user to input filename
	cin >> fileName;												//user inputs the filename

	inFile.open (fileName.c_str());									//the file is opened
	
	while (!inFile)													//if filename is not valid
	{
		inFile.clear();												//then clear the file
		cout << endl << "Re-enter a valid filename: " << endl;		//and ask the user to reenter a valid filename
		cin >> fileName;											//user inputs new filename
		inFile.open (fileName.c_str());								//open file if filename is valid
	}

	cout << "========================================" << endl;		//print header for the text of the file
	cout << "Text of the File" << endl;
	cout << "========================================" << endl;
	
	inFile.get(fileChar);					//get individual characters from the file
	cout << fileChar;						//display the individual character on the screen

	while (inFile)
	{
		if (fileChar == TAG)				//if the character is a tag
		{
			totalTags++;					//increment the total tags
			totalChars++;					//increment the total characters
			inFile.get(fileChar);
			cout << fileChar;
			if (fileChar == LINK)			//if the character is a link
			{
				totalLinks++;				//increment the total links
				totalChars++;				//increment the total characters
			}
			else if (fileChar == COMMENT)	//if the character is a comment
			{
				totalComments++;			//increment the total comments
				totalChars++;				//increment the total characters
			}
		}	
		
		inFile.get(fileChar);
		cout << fileChar;
		totalChars++;
	}

	cout << "========================================" << endl;
	cout << "End of the Text" << endl;
	cout << "========================================" << endl << endl;

	cout << "Analysis of File" << endl;
	cout << "----------------" << endl << endl;
	cout << "Number of Lines: ";
	cout << "Number of Tags: " << totalTags << endl;
	cout << "Number of Comments: " << totalComments << endl;
	cout << "Number of Links: " << totalLinks << endl;
	cout << "Number of Chars in File: " << totalChars << endl;
	cout << "Number of Chars in Tags: ";
	cout << "Percentage of Characters in Tags";
	
	inFile.close ();
	return (0);
}
Lines 41-67 are just too complicated for what you are trying to do.
Try this instead:
1
2
3
4
5
6
std::string line;
while(inFile) //check if file is still open and has data leftover to read
{
    std::getline(inFile, line); //get line from file
    std::cout << line << std::endl; //print line to screen
}


That will print out the contents of the file. Your next step would be to analyze by either using the inside of that loop process each line character by character OR reopening the file in another loop and going character by character.
Last edited on
Well, it's printing the correct number of lines during the analysis, but the other parts (tags, comments, links, etc.) still say zero. So I'm guessing that I am doing something wrong right here...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
while(inFile)				//while the file is open
	{
		getline(inFile, line);	//get each line from the file
		cout << line << endl;	//and print the lines on the screen
		totalLines++;
	}
	
	cout << "========================================" << endl;		//print the end of text message
	cout << "End of the Text" << endl;
	cout << "========================================" << endl << endl << endl;

	while(inFile)
	{
		if(fileChar == TAG)
		{
			totalTags++;
			if(fileChar == COMMENT)
				totalComments++;
			else if(fileChar == LINK)
				totalLinks++;
		}

		inFile.get(fileChar);
		cout << fileChar;
		totalChars++;
	}

	cout << "Analysis of File" << endl;
	cout << "----------------" << endl << endl;
	cout << "Number of Lines: " << totalLines << endl;
	cout << "Number of Tags: " << totalTags << endl;
	cout << "Number of Comments: " << totalComments << endl;
	cout << "Number of Links: " << totalLinks << endl;
	cout << "Number of Chars in File: " << totalChars << endl;
	cout << "Number of Chars in Tags: " << endl;
	cout << "Percentage of Characters in Tags: " << endl;
	
	inFile.close ();
	return (0);
}
Last edited on
You will need to reopen the file somewhere between line 6 and line 12. Or adjust the seekg position for inFile. There is an example on changing seekg here:
http://www.cplusplus.com/reference/istream/istream/seekg/

std::fstream keeps track of what is leftover in the file to work with, and changing seekg moves that tracker. In your case you want to move it back to the beginning.

For the while loop on lines 12 to 26, you are going to need to think that out more carefully. Try writing out a state diagram on paper. Draw a circle where the while loop starts at the beginning of a file. What does it search for in that circle? For the start of an opening tag (<). Since you have to keep track of
Number of Chars in Tags
you might draw another circle for being inside a tag. Then connect an arrow from the first circle to the second and put the open tag (<) written alongside that arrow. Continue drawing these circles with descriptions is to be done inside the circle while searching, and arrows that have the conditional character above them to the next circle. This drawing of circles and arrows is an example of a state design pattern. There is a link here if you would like to do some further reading.
http://www.codeproject.com/Articles/509234/The-State-Design-Pattern-vs-State-Machine
Last edited on
Topic archived. No new replies allowed.