Hello PacificAtlantic,
Some little things to get you started:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
|
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string text; // file text
int lineCount{}; // number of lines
int tagCount = 0; // number of tags
int linkCount = 0; // number of links
int commCount = 0; // number of comments
int charCount = 0; // number of characters
int tag_charCount = 0; // number of characters in tags
double ratio = 0; // ratio of characters in tags vs. out of tags
string filename{ "Test HTML.html" }; // <--- Used for testing. Comment or remove when finished. That is everything including the {}s.
const char TAG = '<';
const char LINK = 'a';
const char LINK2 = 'A';
const char COMM = '!';
constexpr char COMM2{ '-' };
const char TAG_END = '>';
cout <<
'\n' <<
std::string(23, ' ') << "HTML File Stat Viewer\n" << std::string(70, '-') << '\n' <<
" Enter a valid filename.\n"
" (Should have no blanks, and include the extension) > ";
cout << filename << "\n"; // <--- Used for testing. Comment or remove when finished.
//cin >> filename; // <--- Used for testing. Uncomment when finished. Should be changed to use "std::getline()".
ifstream file(filename);
//file.open(filename);
while (!file) // Error-checking for file name
{
file.clear();
cout <<
"\n Invalid file name. Please enter a valid file name.\n\n"
"Enter a valid filename.\n"
" (Should have no blanks, and include the extension) > ";
std::getline(std::cin, filename); // <--- Changed.
file.open(filename);
}
cout << "\n File Text:\n";
cout << std::string(70,'-') << '\n';
//while (file)
while(getline(file, text))
{
cout << text << "\n";
lineCount++;
for (int i = 0; i < text.size(); i++)
{
if (text[i] == TAG)
|
The comments should explain most of the changes.
Looking at lines 10, 17 and 23 if this gives an error when you compile the code then you may need to adjust your IDE/compiler to use at least the 2011 standards. Or you may need to upgrade. The C++ 2017 standards are considered the current standard to compile to.
The {}s in lines 10, 17 and 23, known as the uniform initializer, are available from C++11 on.
Although not mandatory the constant variables are usually give capital letters. This helps to realize that these variables are constants that can not be changed. It also sets them apart from regular variables.
Line 34 is the simple way to define a file stream variable and open the file at the same time. Inside the while you will still need the ".open()" because the variable is already defined.
In the second while loop and as
seeplus showed you this is the best way to read a file of unknown length. With your code:
1 2 3 4 5 6 7
|
while (file)
{
getline(file, text);
cout << text << "\n";
lineCount++;
|
With the read inside the while loop when the read sets the "eof" bit on the stream you are still doing a "cout" to the screen of something, what this is may be undetermined, then you add 1 to "lineCount" when you do not need to.
When you get to the line
cout << "Number of lines: " << lineCount - 1 << endl;
. The "- 1" here should be telling you that there is a problem because you should not need this.
Done correctly the while loop will end before you add the extra 1 to "lineCount".
Now you have a bigger problem than just counting the character between "< >". According to your program anything that starts with "<" is a tag. In the test file that I am using I have these 2 lines of code:
1 2 3
|
<td class="Gr"> <150 </td
<td class = "None"> < 80 </td>
|
The bold type is what is printed in the table cell and neither would be considered a proper tag, but the program counts them as tags when it should not.
The next problem I found is the comments. The first line of code in my test file is:
<!DOCTYPE html>
. This is not a comment, but is counted as one. This may not be completely accurate, but I would consider this more as a directive to the browser to tell it what is coming and how to process it.
https://www.w3schools.com/tags/tag_doctype.asp
As I remember it a true comment consists of 4 characters:
<!--
. To count as a comment you will need to check all 4 characters for a match. This part I am not sure of yet:
Is it considered a comment or something special?
When opening the "html" file in MSVS 2017 it changes the colour of the type to green to show it is a comment, when surrounded by
<!-- -->
, so this
may be considered an true comment.
First you need to figure out what is a proper tag and how you will deal with closing tags like
</p>
. Will they count as a tag or should they be counted as a "closingTagCount"? Saying that "<" makes a tag is not working.
The if and if/else if statements in the while loop need worked on to better what is a tag and comment. Checking for a link with the
<a>
tag is working. My test file has 2
<a>
tags and it counted them just fine.
Until you figure out what a proper tag is counting the character between the "< >" is the least of your problems.
Andy