fastest find and replace for a string

Pages: 123
Since C++17 Boyer-Moore is part of std::search. See https://en.cppreference.com/w/cpp/algorithm/search

A simple truth:
you don't have to fully understand everything you implement. Sometimes you can just use what is already been done and trust it, esp when in a hurry as you seem to be.
I'm not sure Boyer-Moore will help much here. That algorithm will find a single substring within a larger text. Because it jumps by the length of the substring in the default case, it is most efficient when the search substring is a little bit longer. Having a search substring of length 2 ("") would not really make too much use of the algorithm's efficiencies. It will cut the search time (compared to a brute force sequential search) by half at the theoretical maximum. So I guess that is something, but it may not be as big an improvement as you were hoping for.

Looking for single characters to replace won't benefit from Boyer-Moore. And Boyer-Moore won't easily help you look for multiple different search strings at the same time, unless there is an extension to the algorithm that I am not aware of.

I used your original string:

Item exp, 2"" B/M 90deg widget (As Per Quote AABBITU26.056)

And copied it 493 times so that I have a string "str" with 493 double quotes to be replaced.

This would take 10ms (fastest time) to be done via the REGEX code you put originally. I set VS to release mode and it optimized that code heavily and the time became 0.52ms roughly with your original code.

1
2
std::regex double_quote_find {R"<("")<"};
str = std::regex_replace (str, double_quote_find, "''");


I then coded it as shown below, and this way gave 0.26ms roughly, about twice as fast as your original code.

Release mode for this new code improved to about 0.23ms, so this code is very well optimized already. The standard library usually always has very good performance:

1
2
3
4
5
6
        auto i = str.find("\"\"");
        while (i != string::npos)
	{
		str.replace(i, 2, "''");
		i = str.find("\"\"");
	}


However, this code by far the fastest. Doing the same large string in 0.0435ms, going down to 0.0164ms in release mode:

1
2
3
4
5
6
7
8
	for (auto& i : str)
	{
		if (i == '\"' && *((&i) + 1) == '\"')
		{
			i = '\'';
			*((&i) + 1) = '\'';
		}
	}



EDIT:

Also, for whatever reason, this code is slower than the code right above, giving 0.5755ms and 0.0217ms in release mode, which is still double the time of the release mode time of the last code:

1
2
3
4
5
6
7
8
    for(size_t i = 0; i < str.size() -1; i++ )
    {
        if (str[i] == '\"' && str[i+1] == '\"')
        {
            str[i] = '\'';
            str[i+1] = '\'';
        }
    }



This code gets 0.0512ms and 0.022ms in release mode, not bad for a one liner. Though I don't know if you can tweak it to make it only replace two double quotes instead of just all occurrences of a double quote:

std::replace(str.begin(), str.end(), '\"', '\'');

Trying:

std::replace(str.begin(), str.end(), "\"\"", "''");

Gives these errors:

'==': no conversion from 'const char [3]' to 'int'
'=': cannot convert from 'const char [3]' to 'char'


This shows the internals of std::replace - comparing the 3rd argument with an integer (probably the string characters casted to integers) then it tries to set a single character to the 4th argument. So even though the documentation shows that the 3rd and 4th arguments are a template, it seems to only be functional with a single character or integer.

https://en.cppreference.com/w/cpp/string/basic_string/replace#:~:text=(3)-,template%3C%20class%20InputIt%20%3E,%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0%20%C2%A0InputIt%20first2%2C%20InputIt%20last2%20)%3B,-(until%20C%2B%2B20
Last edited on
Aren't the 3rd and 4th params for std::replace() of type the underlying type of args 1 & 2 ? As char is the underlying type of str, then 3 and 4 also need to be of type char (or convertable to)?

std:replace() can't change the size. If you need to change the size then use std::string.replace() - but this does one replacement. If you want all occurrences then you need a loop - as per the loop above.

Re timings. I've also found that using pointers/iterators to be quicker than indexed array.
I may have gotten really bored and coded the program 0-0

It took 57 seconds for the program to run. With release mode, it took 5 seconds. No multithreading.

I didn't use any regex nonsense so it's a lot faster, and my code is a lot shorter than the last posted attempt by OP.

I coded it to follow the rules stated earlier as I understood them. I replaced occurrences of "" with '', replaced new lines with \n when present within quotes, removed ',' and '$' if found within quotes where there also exists a number, replaced ',' with ';' and trimmed leading/trailing spaces if there did not exist a number within the quotes.

I created a text file about 500mb big with some test cases I came up with repeating over and over. It reads in, makes the needed changes, then outputs to another text file.



It's completely possible that my run-time is a lot faster due to having an SSD rather than HDD.


Aren't the 3rd and 4th params for std::replace() of type the underlying type of args 1 & 2 ?

That makes sense. Was hoping it could act like string.find and string.insert all in one.


Here's the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <chrono>

using namespace std;

inline void replace(string& str)
{
	for (auto& i : str)
	{
		if (i == '\"' && *((&i) + 1) == '\"')
		{
			i = '\'';
			*((&i) + 1) = '\'';
		}
	}
}

int main()
{
	string str = "", t, g;
	ifstream inFile("test.txt");
	ofstream outFile("out.txt");

	auto t0 = std::chrono::steady_clock::now();

	bool quote = false;
	string out = "";
	while (getline(inFile, str))
	{
		for (auto& i : str)
		{
			if (i == '\"' && i != str.back() && *((&i) + 1) == '\"')
			{
				i = '\'';
				*((&i) + 1) = '\'';
			}

			else if (i == '\"')
			{
				quote = !quote;

				if (quote)
				{
					outFile << str.substr(0, str.find(i)+1);
					str.erase(0, str.find(i)+1);

					out = "";
				}

				if (!quote)
				{
					out += str.substr(0, str.find('\"'));

					str.erase(0, str.find('\"') + 1);

					if (std::any_of(out.begin(),
						out.end(), ::isdigit))
					{

						while (out.find(',') != string::npos)
							out.replace(out.find(','), 1, "");

						while (out.find('$') != string::npos)
							out.replace(out.find('$'), 1, "");
					}

					else
					{
						out.erase(0, out.find_first_not_of(' '));

						while (out.find(',') != string::npos)
							out.replace(out.find(','), 1, ";");
						
						auto e = out.find_last_not_of(' ');
						if(e < out.size()-1)
							out.erase(out.find_last_not_of(' ') + 1);
					}

					out += '\"';
					outFile << out;
					out = "";
				}
			}
		}

		if (quote)
		{
			str += R"(\n)";
			out += str;
		}

		else
			outFile << str << '\n';
	}


	auto t1 = std::chrono::steady_clock::now();
	auto d = std::chrono::duration_cast<std::chrono::duration<double>>(t1 - t0);

	std::cout << str << '\n';
	std::cout << '\n' << d.count() << " Seconds" << std::endl;
}
Last edited on
str, t, g?
Lol, ended up not using t or g, never removed them. And looking at the code again, I didn't use the replace function either.
Last edited on
First of all, I want thank anyone who took the trouble to read this post.

@zapshe, I really appreciated you taking the time to write a working program.
Today is the first day I tested it, it won't work for me, if you run against my test data you'll see why.

Since this is a real problem for me, and I never thought that someone would take the time that you did, I didn't explain some of my problems clearly.

What I needed was a way to wash a comma delimited file of all commas that were not being used as delimiters.
It is very difficult for me to explain how many ways they can send me delimited file that will not play nice with the application I need to feed it to.

I run into different problems depending on how they created their delimited file.
Examples are:
1.) Embedded return characters.
2.) Numbers with commas in them.
3.) Using a double quote to represent inches, within a quoted string.
4.) Descriptions where commas are used in a sentences, and the sentence also contains a numeric value with commas used as a separator.

Microsoft Excel does an excellent job in most instances, but has problems with things that look like numbers, but are in fact something else.
Microsoft Excel cannot handle really large extracts.

Here is a program that will produce test data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <iostream>
#include <algorithm>
#include <stdexcept>
#include <vector>
#include <fstream>
#include <sstream>
#include <regex>



int main(int argc, const char* argv[]) try { 

    std::vector<std::string> varg(argv + 1 , argv + argc);


    std::fstream ofh; 
    ofh.open("junk_in.csv", std::ios::out); 
    if (!ofh) 
        throw std::runtime_error("File not opened: junk_in.csv"); 

    std::fstream ofh2; 
    ofh2.open("junk_in_small.csv", std::ios::out); 
    if (!ofh2) 
        throw std::runtime_error("File not opened: junk_in_small.csv"); 

    std::string head{R"(field A,field B,field C,field D,field E,field F,field G,field H,field I)"};
    
    std::string rec{R"("
NULL
NULL
","      This has leading space. Then a number $1,234.45. Then trailing space.   ","This files has a first line.
And a second line.","""$12,345.67""","An amount in side of a text field $1,234.45",12345.67,2021-12-23,"12345,45678,789012","They Puchased this 72"" thing for 1,000,000.12")"};

    ofh << head << '\n';

    for (int i =0; i < 1350000; i++)
        ofh << rec << '\n';

    ofh2 << head << '\n';

    for (int i =0; i < 4; i++)
        ofh2 << rec << '\n';


    return 0;
}
catch (std::exception& e) {
    std::cerr << e.what() << '\n';
    return 1;
}
catch (...) {
    std::cerr << "uncaught" << '\n';
    return 1;
}
This is the best I've done.
I takes about 6 minutes to parse junk_in.csv.
All criticism is appreciated, but it is not expected.
Thank you.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
#include <iostream>
#include <algorithm>
#include <stdexcept>
#include <vector>
#include <fstream>
#include <sstream>
#include <regex>


std::string get_number (std::fstream& ifh, bool& while_in_quote, char value,  const std::regex& number);

int main(int argc, const char* argv[]) try { 

    std::vector<std::string> varg(argv + 1 , argv + argc);


    if (varg.size() != 2)
        throw std::runtime_error(R"(Wrong format:
process_comma_delimited_v2 in_file.csv out_file.csv)");

    std::fstream ifh; 
    ifh.open(varg[0], std::ios::in); 
    if (!ifh) 
        throw std::runtime_error("File not opened: " + varg[0]);

    std::fstream ofh; 
    ofh.open(varg[1], std::ios::out); 
    if (!ofh) 
        throw std::runtime_error("File not opened: " + varg[1]); 
    

    //here are my precompiled regex
    //const std::regex regex_number { R"(\$?\d\d?\d?(,\d{3})*(\.\d*)?)" };
    const std::regex regex_number { R"(\$?\d\d?\d?(,\d{3})*(\.\d*)?\.?)" };
    const std::regex leading_space { R"(,\s+|\s+,)" };
    
    char value;
    std::string number_checker;
    bool while_in_quote = false;
    std::stringstream ss;

    for ( char c = static_cast<char>(ifh.peek()); ifh.peek() != -1; c = static_cast<char>(ifh.peek())) {

        ifh.get(value);
        if (value == '\"') {

            while_in_quote = !while_in_quote; 

            char quote_check = static_cast<char>(ifh.peek()); 

            if (quote_check == '\"') { 

                while_in_quote = !while_in_quote; 
                ifh.get(value);

            } else 
                continue;
        } 


        if (while_in_quote) {
            if (value == '\n') {
                ss << R"(\n)";
                continue;
            }

        }

        number_checker = get_number(ifh, while_in_quote, value, regex_number);

        if (!number_checker.empty()) { 

                ss << number_checker;
                number_checker.clear(); 
                continue;

        } 
        
        if (while_in_quote && value == ',')
            value = ';';

        if (!while_in_quote) {
            if(value == '\n') { 

                std::string s = std::regex_replace(ss.str() , leading_space, ","); //remove leading spaces
                ofh << s << '\n';
                ss.clear();
                ss.str("");
            } 
            else 
                ss << value;
        } else ss << value;

    }


    return 0;
}
catch (std::exception& e) {
    std::cerr << e.what() << '\n';
    return 1;
}
catch (...) {
    std::cerr << "uncaught" << '\n';
    return 1;
}


std::string get_number (std::fstream& ifh, bool& while_in_quote, char value,  const std::regex& regex_number) { 

    std::string number_checker;

    if(!while_in_quote) return "";

    if(!std::isdigit(value) && value != '$') { 
        return "";  
    } else 
        number_checker += value;

    char check_char = static_cast<char>(ifh.peek());

    for(;;) {
        if( std::isdigit(check_char) || check_char == '.' || check_char == ',' || check_char == '$') {
                number_checker += check_char;
                ifh.get(value);
                check_char = static_cast<char>(ifh.peek());
        } else 
            break;

    }
    
    std::smatch sm;

    if (std::regex_match(number_checker, sm, regex_number)) { 
        const auto e1 {std::remove(number_checker.begin(), number_checker.end(), ',')};	                 //remove commas in numbers
        number_checker.erase(std::remove(number_checker.begin(), e1, '$'), number_checker.end());		//remove dollar sign
    }

    std::replace(number_checker.begin(), number_checker.end(), ',', ';'); //replacing commas with semi-colons
        

    return number_checker;
}
The slowest thing in your code right now would likely be the regex you're using. The timings I made showed that (AT BEST) it's twice as slow as coding a loop and looking for the string yourself. At worst, its about 30x slower than the best alternative.

If you took some time to code alternatives for your regex usage, your program would likely double in speed at least.
This takes about 4 secs to process junk_in.csv (390Mb) on my laptop. It may not give exactly the same as the output from your program but try it and let me know where it requires further tweaking. Windows only. Merry Christmas!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <cctype>
#include <iomanip>

constexpr size_t defchunk1 {5'000'000};

// fn - file name
// buff - where to read to file to
// returns - true if read succeeded, false otherwise
bool WOreadFile(const std::string& fn, std::string& buff, size_t chunk = defchunk1) {
	bool ret {};	// Problems

	if (const auto fh {CreateFile(fn.c_str(), FILE_READ_DATA, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED, NULL)}; fh != INVALID_HANDLE_VALUE) {
		if (const auto sz {GetFileSize(fh, NULL)}; sz != INVALID_FILE_SIZE) {
			constexpr size_t evtsz {50};
			const size_t nochk {sz / chunk + 1};
			const auto thid {GetThreadId(GetCurrentThread())};	// Sort of unique event name

			std::vector<OVERLAPPED> vo(nochk);
			std::vector<HANDLE> vh(nochk);
			char evt[evtsz] {};

			buff.resize(sz);

			for (size_t i = 0, done = 0; i < nochk; ++i, done += chunk) {
				snprintf(evt, evtsz, "oevt%u-%zu", thid, i);
				vo[i].hEvent = vh[i] = CreateEvent(NULL, FALSE, FALSE, evt);
				vo[i].OffsetHigh = 0;
				vo[i].Offset = done;
				ReadFile(fh, buff.data() + done, std::min(chunk, (size_t)(sz - done)), NULL, &vo[i]);
			}

			WaitForMultipleObjects(vh.size(), (HANDLE*)vh.data(), TRUE, INFINITE);

			ret = true;

			for (const auto& e : vo) {
				ret &= ((DWORD)e.Internal == ERROR_SUCCESS);	// Have all reads succeeded
				CloseHandle(e.hEvent);							// Close events
			}
		}

		CloseHandle(fh);
	}

	return ret;
}

// Remove spaces before/after , if not in quotes
// Remove $
// Replace "" with ''
// Remove , if in a number inside quotes
// Replace , with ; if in quotes
// Ignore /n /r inside quotes
std::string process(const std::string& str) {
	std::string outstr(str.size(), 0);

	bool inelem {};
	bool inquote {};

	const char* ch {str.c_str()};
	char* outch {outstr.data()};

	for (; *ch; ++ch) {
		// Remove $
		if (*ch == '$')
			continue;

		// Ignore space if not in an element
		if (*ch == ' ') {
			if (inelem) {
				if (inquote && (*ch == ' '))
					if ((*(ch - 1) == '\"') || (*(ch - 1) == ' '))
						continue;

					*outch++ = *ch;
			}

			continue;
		}

		// Remove previous spaces before a , or \n if not in quote
		if (!inquote && (*ch == ',' || *ch == '\n' || *ch == '\r')) {
			while ((outch > outstr.data()) && *(outch - 1) == ' ')
				--outch;

			*outch++ = *ch;
			inelem = false;
			continue;
		}

		// Ignore \n or \r if in quote
		if (inquote && (*ch == '\n' || *ch == '\r'))
			continue;

		// Replace "" with ''
		if (*ch == '\"' && *(ch + 1) == '\"') {
			*outch++ = '\"';
			++ch;
			continue;
		}

		// Deal with quote
		if (*ch == '"') {
			inquote = !inquote;
			inelem = true;
			continue;
		}

		// Is , in a number in quotes
		if (*ch == ',') {
			// Ignore if part of a number otherwise replace with ;
			if (!std::isdigit(static_cast<unsigned char>(*(ch + 1))))
				*outch++ = ';';

			continue;
		}

		inelem = true;
		*outch++ = *ch;
	}

	outstr.resize(outch - outstr.data());
	return outstr;
}

int main() {
	//std::string str{ R"!(Item exp, 2"" B/M 90deg widget (As Per Quote AABBITU26.056))!" };
	//const std::string str {R"!(   qwe  ,  pou ,  "4,5 qw,po "" (8)", 987)!"};

	std::string buff;

	if (!WOreadFile("junk_in.csv", buff))
		return (std::cout << "Cannot read input file\n"), 1;

	std::ofstream ofs("csvout2.csv", std::ios::binary);

	if (!ofs)
		return (std::cout << "Cannot open files\n"), 2;

	ofs << process(buff);
}

Last edited on
@seeplus Thank you! and Merry Christmas!

I'm having trouble building, here are the first couple errors.
1
2
3
4
5
6
7
8
9
10
11
12
main.cpp(16): error C2065: 'FILE_READ_DATA': undeclared identifier
main.cpp(16): error C2065: 'FILE_SHARE_READ': undeclared identifier
main.cpp(16): error C2065: 'OPEN_EXISTING': undeclared identifier
main.cpp(16): error C2065: 'FILE_ATTRIBUTE_NORMAL': undeclared identifier
main.cpp(16): error C2065: 'FILE_FLAG_OVERLAPPED': undeclared identifier
main.cpp(16): error C3861: 'CreateFile': identifier not found
main.cpp(16): error C2065: 'INVALID_HANDLE_VALUE': undeclared identifier
main.cpp(17): error C3861: 'GetFileSize': identifier not found
main.cpp(17): error C2065: 'INVALID_FILE_SIZE': undeclared identifier
main.cpp(20): error C3861: 'GetCurrentThread': identifier not found
main.cpp(20): error C3861: 'GetThreadId': identifier not found
main.cpp(22): error C2065: 'OVERLAPPED': undeclared identifier


I tried to build it using my standard cmake file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cmake_minimum_required(VERSION "3.10.2")

#set source name
set(SOURCE_NAME "seeplus_version")

#set the project name
project(${SOURCE_NAME} VERSION 1.0 LANGUAGES CXX)


#add the executable
add_executable(${SOURCE_NAME}    
    ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp
)

#add libraries
#target_link_libraries()

target_include_directories(${SOURCE_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})

target_compile_options(${SOURCE_NAME} PRIVATE /W4 /O2)

target_compile_features(${SOURCE_NAME} PRIVATE cxx_std_20)


Maybe this line will be useful:
1
2
cl   /TP  -IC:\Users\wilsonp\Desktop\seeplus /DWIN32 /D_WINDOWS /W3 /GR /EHsc /MD /O2 /Ob2 /DNDEBUG /W4 /O2 -std:c++latest /showIncludes /FoCMakeFiles\seeplus_version.dir\main.cpp.obj /FdCMakeFiles\seeplus_version.dir\ /FS -c C:\Users\wilsonp\Desktop\seeplus\main.cpp
cl : Command line warning D9025 : overriding '/W3' with '/W4'
Last edited on
seeplus' code requires some windows libraries like Kernel32.lib and maybe others.
The internet insists that if I run this bat I don't need to add kernel32 to my CMake:
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars32.bat" x86

I normally run this:
 
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x86


Neither bat helped.

In this location I certainly see kernel32.dll
 
C:\Windows\System32


If someone would like to walk me through setting up my system and running CMake; I would appreciate it.

My typical build is:
1.) run this bat:
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x86

2.) create a build subdirectory and cd into it, after which I run this command:
cmake -G"NMake Makefiles" -D"CMAKE_BUILD_TYPE=Release" -DCMAKE_VERBOSE_MAKEFILE=off ..

Last edited on
No linker setting will fix a compiler error. So first you need to include windows.h or any file that includes a declaration of the relevant symbols.
Sorry. My bad. I didn't copy all the program to paste here! Compiles OK and tested with VS2022. Add this at the beginning:

1
2
3
4
5
#define _CRT_SECURE_NO_WARNINGS
#define WIN32_LEAN_AND_MEAN
#define NOMINMAX
#include <windows.h>

Last edited on
@seeplus Thank you.

It does parse in less than 4 seconds. I do have some question,and changes I would like to make, but I'll need a day or two to look this over.

This is very much learning exercise for me, so before I start tweaking it, I want to make sure I understand. This program will have a practical use, but I also want to learn how to write well.
but I also want to learn how to write well.


Well I knocked out the original process() in about 10 mins plus a few more minutes subsequently to remove leading spaces inside an element. So I make no claims at all about 'well written code' for this!

:)

Here is a search and replace (SAR) by simple string indexing.
I use a helper function tallynum which is self contained and tallies the number of stringlets to replace
I use only a bare string (not from file), it is very fast.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103


#include<string>
#include<iostream>
#include<ctime>

using namespace std;

int tallynum(const string &somestring,const string &partstring)
{
   int i,j,ln,lnp,count;
  ln=somestring.length();
  if (ln==0)  return 0;
  lnp=partstring.length();
  if (lnp==0) return 0;
  count=0;
  i=-1;
  do
  {
      i+=1;
      if (somestring[i] != partstring[0]) continue;
      if (somestring[i] == partstring[0]) 
      {
          for (j=0;j<=lnp-1;j++)
            if (somestring[j+i]!=partstring[j]) continue; 
     }
    count+=1;
    i=i+lnp-1;
}
  while (i<=ln-1);
  return count;
}

string SAR(const string &original,const string &find,const string &replace)
{
 if (find.length() == 0) return original;
 int t=tallynum(original,find);
  if (t==0) return original; 
  int found=0,n=0,staid=0,m;
  int Lf = find.length(),Lr = replace.length(),Lo = original.length();
   t = original.length() - t * Lf + t * Lr;
    string res(t,char(0));
	do
	{
	if (original[n] == find[0])    // got a possible
	{
      for (m =0;m<=Lf-1;m++) 
        if (original[n + m] != find[m])  goto lbl; // no
      found = 1;        //Bingo
    } 
	if (found == 1)
	{	
      for (m = 0;m<=Lr-1;m++)
      {
        res[staid] = replace[m];           // insert the replacerment
        staid += 1;
      } 
      n += Lf;
      found = 0;
      continue;
    }		
    lbl:
res[staid] = original[n];
    staid += 1;
    n += 1;		
}
while(n<=Lo);
	return res;        	
}


string Left(string& str, int pos) // only to print out the first few characters
{
    int i;
    string temp = "";
    for(i = 0; i < pos; i++)
    {
        temp += str[i];
    }
    return temp;
}

int main()
{
	double t;
	string s="abcfffabcabcgggabc ";
	string g;
	for (int i=1;i<18;i++) // enlarge the string
	s+=s;
	
cout<<Left(s,100)<<" . . ."<<endl;	
	t=clock();
g=SAR(s,"abc","12345"); // <<---- here abc replaced by 12345
    t=clock()-t;
cout<<Left(g,100)<<" . . ."<<endl;

cout<<"original string length "<<s.length()<<endl;
cout<<"   new   string length "<<g.length()<<endl;
cout<<"time taken "<<t/ (double) CLOCKS_PER_SEC<<"  seconds"<<endl;
	cout<<"Press return to end . . ."<<endl;
	cin.get();
}
  

Seasons greetings.
Last edited on
Topic archived. No new replies allowed.
Pages: 123