In a nutshell: The code is supposed to open a file, read in the contents, search for a specific word and replace it (John to Replaced as shown in my code below) ~ Where the problem is that I keep coming up against, that when I go to output the file, the same 'paragraph' format isn't kept.
For example of the original file:
Sentence one.
Sentence two.
Sentence three. and another one!
Next sentence possibly without punctuation
There is so much online regarding the removal of whitespace, or how to output with the space between words/strings... but the only suggestion I've seen is getline() ~ I went back, modified my code, but I cannot figure out where I'm going wrong with the ofstream portion.
read input file into std::string with std::istreambuf_iterator<char>
std::regex_replace above std::string and send it to output file with std::ostreambuf_iterator<char>:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <regex>
int main()
{
std::ifstream inFile{"D:\\test1.txt"};
std::string fileRead {std::istreambuf_iterator<char>(inFile), (std::istreambuf_iterator<char>()) };
//check file size and pre-allocate memory with std::string.reserve()(assumes file size ! greater than max string size)
std::ofstream outFile {"D:\\test2.txt"};
std::regex_replace(std::ostreambuf_iterator<char>(outFile), fileRead.begin(), fileRead.end(), std::regex{std::string{"John"}}, "**REPLACED**" );
}
various add-ons are available like case-sensitive searches etc via the help files of each of the above functions and objects
To search for a specific word, we need to look for alphanumeric characters that form a single word
(ie. enclosed within word boundaries). For instance:
#include <iostream>
#include <string>
#include <regex>
// replace words in a line of text
// invariant: original and replacement contain only alphanumeric characters
std::string replace_word( std::string line, std::string original, std::string replacement )
{
// the original as a word (within word boundaries) note: \b is the anchor for a word boundary
const std::regex word_re( "\\b" + original + "\\b" ) ;
return std::regex_replace( line, word_re, replacement ) ;
}
int main()
{
const std::string line = "allocate catalysts for a cat, which is an uncomplicated cat!" ;
// replace the word 'cat' with 'tiger'
const std::string original_word = "cat" ;
const std::string replacement = "tiger" ;
std::cout << line << '\n' << replace_word( line, original_word, replacement ) << '\n' ;
}
#include <iostream>
#include <cctype>
#include <string>
// replace words in a line of text
std::string replace_word( std::string line, std::string original, std::string replacement )
{
std::string result ;
for( std::size_t i = 0 ; i < line.size() ; )
{
if( !std::isalnum( line[i] ) ) result += line[i++] ; // not alphanumeric, add to result
else // alphanumeric, start of a word
{
std::string word ; // form the word consisting of consecutive alphanumeric characters
while( i < line.size() && std::isalnum( line[i] ) ) word += line[i++] ;
// add the word (or its replacement) to the result
result += ( word == original ? replacement : word ) ;
}
}
return result ;
}int main()
{
const std::string line = "allocate catalysts for a cat, if it is an uncomplicated 'cat'!" ;
// replace the word 'cat' with 'tiger'
const std::string original_word = "cat" ;
const std::string replacement = "tiger" ;
std::cout << line << '\n' << replace_word( line, original_word, replacement ) << '\n' ;
}
gunnerfunner, your approach is sort of mind blowing. I haven't been coding long enough to realize that it's entirely possible to have a fully functioning code with so few lines.
Thank you everyone for your help. Your explanations were very much appreciated, as your comments turned into notes in the sidelines of my pseudocode notebook.