1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
|
#include <iostream>
#include <string>
#include <utility>
#include <fstream>
#include <sstream>
#include <cctype>
#include <functional>
#include <algorithm>
const size_t maxwrds {500};
struct Words {
size_t cnt {};
std::string wrd;
};
using WrdsCnt = Words[maxwrds];
std::string tolower(const std::string& str)
{
std::string low;
low.reserve(str.size());
for (const auto ch : str)
if (!std::ispunct(ch)) // Ignore punctuation
low += (char)std::tolower(ch); // Make lower case
return low;
}
size_t getwrd(const std::string& line, WrdsCnt& wc)
{
static size_t nowrds {};
std::istringstream iss(line);
for (std::string wrd; iss >> wrd; ) {
bool got {};
for (size_t w = 0; !got && w < nowrds; ++w)
if (wc[w].wrd == wrd) {
++wc[w].cnt;
got = true;
}
if (!got)
if (const auto w = tolower(wrd); !w.empty()) {
wc[nowrds].wrd = w;
++wc[nowrds++].cnt;
}
}
return nowrds;
}
int main(int argc, char* argv[])
{
const std::string opent {"<BODY>"};
const std::string closet {"</BODY>"};
WrdsCnt wrdcnts;
size_t nowrds {};
std::cout << "Processing files - ";
for (int a = 1; a < argc; ++a) {
std::ifstream ifs(argv[a]);
if (ifs) {
std::string body;
std::cout << argv[a] << " ";
for (auto [text, gotbod] {std::pair {std::string{}, false}}; std::getline(ifs, text); )
for (size_t fnd {}, pos {}; fnd != std::string::npos; )
if (gotbod)
if (fnd = text.find(closet, pos); fnd != std::string::npos) {
gotbod = false;
body += text.substr(pos, fnd - pos);
pos += closet.size();
nowrds = getwrd(body, wrdcnts);
body.clear();
} else
body += text.substr(pos) + " ";
else
if (fnd = text.find(opent, pos); fnd != std::string::npos) {
gotbod = true;
pos = fnd + opent.size();
}
} else
std::cout << "\nCannot open file " << argv[a] << '\n';
}
std::sort(std::begin(wrdcnts), std::begin(wrdcnts) + nowrds, [](const auto& a, const auto& b) {return a.cnt > b.cnt; });
std::cout << '\n';
for (size_t top10 = 0; const auto& [cnt, wrd] : wrdcnts)
if (top10++ < 10)
std::cout << wrd << " " << cnt << '\n';
else
break;
}
|