Splitting text for writing to multiple files

Hello dear forum users. I'm just starting to learn C ++ and english isn't my native language. I'm hope for your help.

In the program I need to implement the following algorithm: In dialogue mode enter N. N is the number of lines that we will write to the HTML_file.html. If the number of lines exceeds N-we need to create a new HTML file. Moreover, sentences must not be broken. If the sentence already goes beyond the N, we must write the whole sentence in a new HTML_file.html.

Words for writing to HTML_file.html I get from the TEXT file (and if the words from the TEXT.txt file are also in the DICTIONARY.txt file, then in the HTML_file.html I write this word in bold-italic type)

This is a class for creating, closing, writing to HTML_file.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
  #pragma once
#include <string>
#include <fstream>
#include <sstream>
#include <iostream>
using namespace std;
class HTML_file
{
public:

	ofstream create_file(string name_of_file, string s)
	{
		name_of_file = name_of_file + s + ".html";

		ofstream file = ofstream(name_of_file);
		if (!file)
			cout << "ERROR: file " << name_of_file << " can't be open.";

		file << "<html> <body>";

		return file;
	}

	ofstream open_file(string name_of_file)
	{
		ofstream file = ofstream (name_of_file);
		if (!file)
			cout << "ERROR: file " << name_of_file << " can't be open.";

		return file;
	}
	void close_file(ofstream file)
	{
		file << "</body></html> ";
		file.close();
	}
	
	void filing_usual_word(ofstream file ,char word)
	{
		file << word << " ";
	}
	
	void filing_special_word(ofstream file, char word)
	{
		file << "<b><i>" << word << "</b></i>" << " ";
	}
};


This is piece of program

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
HTML_file HTML_FILE; 
while (!text.eof())
    {
        text >> words_keeper;
        for (int counter = 0; counter < number_of_words; counter++)
        {
            if (change_the_string(Array_of_words[counter], words_keeper)) 
//here i сompare words from dictionary.txt and text.txt
            {
              HTML_FILE.filing_special_word(HTML,words_keeper);  break;
            }
 
            if (counter + 1 == number_of_words)HTML_FILE.filing_special_word(HTML,words_keeper);  
//if word from dictionary.txt != word from text.txt
 
        }
 
    }
 
HTML_FILE.close_file(HTML); 
    file.close_file(file.open_file("text.txt"));
Last edited on
up
> Moreover, sentences must not be broken. If the sentence already goes beyond the N,
> we must write the whole sentence in a new HTML_file.html.

What is the definition that was given for a sentence.

For instance, is this one sentence?
C.S. Lewis wrote, in 'Letters to Children': “Never use abstract nouns when concrete ones will do. If you mean “More people died” don’t say “Mortality rose.”
It's not "real program". Separating when we meet . ! ?! !!! ... will be enough. Thx JLBorges for response.
1. Read lines from the input file one by one into a vector (or deque) of strings (std::string) till the size of the container is N (or eof has been encountered).

2. Scan the lines in the container backwards, starting with the last line, looking for a character or sequence of characters which denote the end of a sentence. std::string::rfind could be used for this.

3. Let us say, the end of sentence was found on line k at position pos. Write the first k-1 lines into the html file, and erase those lines from the container. Write the first pos characters in the first line that remains in the container into the html file, and erase the first pos characters from that line.

4. Repeat steps 1 to 3 till end of input file is reached.
Just for hints:

HTML_file.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#ifndef HTML_FILE_H
#define HTML_FILE_H

#include <fstream>
#include <set>
#include <string>
#include <vector>

class HTML_file
{
public:
   HTML_file() = default;
   HTML_file(const std::string& infname, const std::string& outfname,
             const std::string dict, int linesperfile);
   ~HTML_file();
   void createHtmlFile(int filecount);
   void closeHtmlFile();
   void openInputFile();
   void closeInputFile();
   void openDictfile();
   void closeDictfile();

   void copyWords();

   std::string getHtmlFilename() const;
   void setHtmlFilename(const std::string &value);

   int getLineslimit() const;
   void setLineslimit(int value);

   std::string getInfilename() const;
   void setInfilename(const std::string &value);

   std::string getDictfilename() const;
   void setDictfilename(const std::string &value);

private:
   int lineslimit {0};
   std::ifstream infile, dictfile;
   std::ofstream htmlfile;
   std::string infilename, htmlfilename, dictfilename;
   std::set<std::string> words;

   void htmlHead();
   void constructDictSet();
   std::vector<std::string> splitLine(const std::string& row,
                                      std::vector<std::string> &currentwords);
};

#endif // HTML_FILE_H 


HTML_file.cpp:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
#include <sstream>
#include "HTML_file.h"

HTML_file::HTML_file(const std::string& infname, const std::string& outfname,
                     const std::string dict, int linesperfile)
   : lineslimit{linesperfile}, infilename{infname},
     htmlfilename{outfname}, dictfilename{dict}
{}

HTML_file::~HTML_file()
{
   closeInputFile();
   closeDictfile();
   closeHtmlFile();
}

void HTML_file::createHtmlFile(int filecount)
{
   std::string name = htmlfilename + std::to_string(filecount) + ".html";
   htmlfile.open(name);
   htmlHead();
}

void HTML_file::openInputFile()
{ infile.open(infilename); }

void HTML_file::closeHtmlFile()
{
   htmlfile << "</body>\n</html>" << std::endl;
   htmlfile.close();
}

void HTML_file::closeInputFile()
{ infile.close(); }

void HTML_file::copyWords()
{
   if(lineslimit == 0)
      return;

   constructDictSet();
   if(!infile.is_open()) {
      openInputFile();
   }
   infile.clear();
   infile.seekg(0);

   int controlblock{0}, filecount{0};
   createHtmlFile(filecount);
   std::string line;
   while(std::getline(infile, line)) {
      controlblock++;
      std::vector<std::string> currentwords;
      splitLine(line, currentwords);

      if(controlblock > lineslimit) {
         closeHtmlFile();
         createHtmlFile(++filecount);
         controlblock = 0;
      }

      htmlfile << "\n  <p>";
      for(auto& word : currentwords) {
         if(words.end() != words.find(word))
            word = "<b><i>" + word + "</i></b>";
         htmlfile << word << ' ';
      }
   }
}

std::string HTML_file::getHtmlFilename() const
{ return htmlfilename; }

void HTML_file::setHtmlFilename(const std::string &value)
{ htmlfilename = value; }

int HTML_file::getLineslimit() const
{ return lineslimit; }

void HTML_file::setLineslimit(int value)
{ lineslimit = value; }

std::string HTML_file::getInfilename() const
{ return infilename; }

void HTML_file::setInfilename(const std::string &value)
{ infilename = value; }

std::string HTML_file::getDictfilename() const
{ return dictfilename; }

void HTML_file::setDictfilename(const std::string &value)
{ dictfilename = value; }

void HTML_file::openDictfile()
{ dictfile.open(dictfilename); }

void HTML_file::closeDictfile()
{ dictfile.close(); }

void HTML_file::htmlHead()
{
   htmlfile << "<!DOCTYPE html>\n<html>\n<head>\n"
               "  <meta charset=\"utf-8\">\n</head>"
               "\n<body>" << std::endl;
}

void HTML_file::constructDictSet()
{
   openDictfile();
   std::string line;
   while (std::getline(dictfile, line))
      words.insert(line);
   closeDictfile();
}

std::vector<std::string> HTML_file::splitLine
   (const std::string &row, std::vector<std::string>& currentwords)
{
   currentwords.empty();
   std::stringstream ss(row);
   while (ss) {
      std::string tmp;
      ss >> tmp;
      if(!tmp.empty())
         currentwords.push_back(tmp);
   }

   return currentwords;
}


main.cpp:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
#include <limits>
#include "HTML_file.h"

int main()
{
   HTML_file htmlfile("text.txt", "copy_", "dictionary.txt", 5);
   htmlfile.copyWords();

   std::cout << "\nDone. Press ENTER to close.";
   std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

   return 0;
}


text.txt:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Aenean commodo ligula eget dolor.
Aenean massa.
Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem.
Nulla consequat massa quis enim.
Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.
In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo.
Nullam dictum felis eu pede mollis pretium.
Integer tincidunt.
Cras dapibus.
Vivamus elementum semper nisi.
Aenean vulputate eleifend tellus.
Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim.
Aliquam lorem ante, dapibus in, viverra quis, feugiat a, tellus.
Phasellus viverra nulla ut metus varius laoreet.
Quisque rutrum.
Aenean imperdiet.
Etiam ultricies nisi vel augue.
Curabitur ullamcorper ultricies nisi.
Nam eget dui.
Etiam rhoncus.
Maecenas tempus, tellus eget condimentum rhoncus, sem quam semper libero, sit amet adipiscing sem neque sed ipsum.
Nam quam nunc, blandit vel, luctus pulvinar, hendrerit id, lorem.
Maecenas nec odio et ante tincidunt tempus.
Donec vitae sapien ut libero venenatis faucibus.
Nullam quis ante.
Etiam sit amet orci eget eros faucibus tincidunt.
Duis leo.
Sed fringilla mauris sit amet nibh.
Donec sodales sagittis magna.


dictionary.txt:
1
2
3
4
5
6
7
ipsum
amet
adipiscing
commodo
natoque
dictum
ullamcorper

Thx Enoizat and JLBorges so much.

I'm in process and now i have a problem:

How to separate string on 2 strings?

I have
1
2
vector<string> line_from_file;
	        line_from_file.resize(N+1);


when on last string line_from_file[N-1];
i do pos = line_from_file[i].rfind("."); (also, i=N-1)

Now i have position of ".". How to separate string? I want everything before "." in line_from_file[i] and everithing after "." in line_from_file[i+1]


/////////////////////////////////////////////
piece of program

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
        unsigned short i,pos,last_pos;
	vector<string> line_from_file;
	line_from_file.resize(N+1);
	
	
	while (!text.eof)
	{
		ofstream text_container("container.txt", ios_base::trunc); //here i will record strings

	
		
		for (int i = 0; i < N && !text.eof(); i++)
		{
			getline(text, line_from_file[i]);
			line_from_file[i] += "\n";
		}


			pos = line_from_file[i].rfind("?");
			if (pos != line_from_file[i].size() && pos > last_pos)
				last_pos = pos;
			pos = line_from_file[i].rfind("!");
			if (pos != line_from_file[i].size() && pos > last_pos)
				last_pos = pos;
			pos = line_from_file[i].rfind(".");
			if (pos != line_from_file[i].size() && pos > last_pos)
				last_pos = pos;



...
Last edited on
Something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// split the line line_num at position pos. after the split,
std::vector<std::string>& split_line( std::vector<std::string>& lines,
                                      std::size_t line_num, std::size_t pos )
{
    if( line_num < lines.size() && pos > 0 && pos < ( lines[line_num].size() - 1 ) ) // sanity check
    {
        const std::string first_part = lines[line_num].substr( 0, pos ) ;
        const std::string second_part = lines[line_num].substr(pos) ;

        lines.insert( lines.begin()+line_num+1, second_part ) ;
        lines[line_num] = first_part ;
    }

    return lines ;
}

http://coliru.stacked-crooked.com/a/48618ffb665e8353

This is flawed:
1
2
3
4
5
for (int i = 0; i < N && !text.eof(); i++)
{
    getline(text, line_from_file[i]); // there is no check to see if getline failed
    line_from_file[i] += "\n";
}


Instead,
1
2
3
4
for( std::size_t i = 0; i < N && getline(text, line_from_file[i]) ; ++i )
{
    line_from_file[i] += "\n";
}
@JLBorges
I think your solution is wonderful. I didn't even know that was possible to write something like for( const auto& str : split_line( vec, 2, 9 ) ). I thought after the colon there was only room for a container instance.

@Halloweenman
How to separate string on 2 strings?

But doesn't this problem take us back to JLBorges's previous question?
For instance, is this one sentence?
C.S. Lewis wrote, in 'Letters to Children': “Never use abstract nouns when concrete ones will do. If you mean “More people died” don’t say “Mortality rose.”

Since the number of, let's call them paragraphs, determines which sentence is to be written in which file, in JLBorges's example should the sentence be split into
“Never use abstract nouns when concrete ones will do.

and
If you mean “More people died” don’t say “Mortality rose.

or not?
Last edited on
Topic archived. No new replies allowed.