Std::string question regarding concatenations and speed

Don’t know if anyone would be willing to take a look at this and help but I’m trying to implement the following algorithm in C++ and get it to execute as fast as possible….

1)Create a 2MB string of spaces;
2)Change every 7th space to a ‘P’;
3)Replace every ‘P’ with a "PU";
4)Replace every remaining space with an ‘8’
5)Copy this string to another buffer adding a carriage return & line feed every 90 characters;
6)Output the last 4000 characters to a MessageBox().

I have three versions of it worked out so far; two in C++ and another in C++ but with a lot of C ‘isms’ in it that is several hundred times faster than my versions using String Classes. I’ve used the Standard C++ Library’s std::string class and also my own string class that is considerably faster. Anyway, here is my version using the std::string…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include "windows.h"
#include <stdio.h>
#include <string>
#define  NUMBER 2000000  //tick count = 287922
using namespace std;

string& ReplaceAll(string& context, const string& from, const string& to)
{
 size_t lookHere=0;
 size_t foundHere;

 while((foundHere=context.find(from,lookHere)) != string::npos)
 {
  context.replace(foundHere, from.size(), to);
  lookHere=foundHere+to.size();
 }

 return context;
}

int main(void)
{
 unsigned tick;
 int iCount=0;
 string s2;

 tick=GetTickCount();
 string s1(NUMBER,' ');                //Create 2MB String of spaces;
 s2.reserve(2200000);                  //s2 is my exchange/slop buffer; pre-allocate an extra 20000 bytes
 for(int i=0; i<NUMBER; i++)           //for expansion.  Loop through and put ‘P’ every 7th byte; Step 2 
 {                                     //above.
     iCount++;
     if(iCount%7==0)
        s1[iCount-1]='P';
 }
 ReplaceAll(s1,"P","PU");              //Replace “P”s with “PU”; Step 3 above;
 ReplaceAll(s1," ","8");               //Replace “ “ with “8”; Step 4 above;
 s2.erase(0);                          //Put NULL at offset 0 in s2’s string buffer and start
 for(int i=0; i<NUMBER; i=i+90)        //big concatenation adding CrLf onto end of every 90 byte chunk
     s2+=s1.substr(i,90)+"\r\n";       //which is step 5;
 s1=s2.substr(s2.length()-4000,4000);  //Assign last 4000 bytes to s1 and output to MessageBox() which
 tick=GetTickCount()-tick;             //is Step 6.  Done!  Observe tick count and cry!
 printf("tick = %u\n",(unsigned)tick);
 printf("%u\n",s2.size());
 MessageBox(NULL,s1.c_str(),"Here Is Your String John!",MB_OK);
 printf("%u\n",s2.size());
 getchar();

 return 0;
}


The ReplaceAll() function I got from Bruce Eckel’s “Thinking In C++” book, Volume 2. That isn’t the bottleneck however. The bottleneck is this repeated concatenation…

1
2
for(int i=0; i<NUMBER; i=i+90)
     s2+=s1.substr(i,90)+"\r\n";


Does anyone see anything wrong here or other ways to do it to speed it up. As far as I’m concerned, the speed is atrocious!

The version using my string class runs considerable faster and is much smaller too (126000 ticks compared to 288000 ticks), but you won’t be able to run it without my providing the String Class (which I don’t mind at all doing if anyone is interested in looking at this for me). Here is what it looks like…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <windows.h>
#include <tchar.h>  
#include <stdio.h>
#include "Strings.h"     //My String Class; not std::string
#define NUMBER 2000000   

int main()
{
 DWORD tick;                                       //for timing with GetTickCount()
 String s2;                                        //exchange/slop/extra buffer

 tick = GetTickCount();                            //Get initial tick count
 String s1(_T(' '),NUMBER);                        //make String containing NUMBER of space chars
 for(int i=1; i<=NUMBER; i++)                      //loop through s1 converting every seventh char to 'P'
     if(i%7==0) s1.SetChar(i,_T('P'));             //my String::SetChar() is one based
 s2=s1.Replace((TCHAR*)_T("P"),(TCHAR*)_T("PU"));  //Replace every 'P' with "PU" which expands String
 s1=s2.Replace((TCHAR*)_T(" "),(TCHAR*)_T("8"));   //one char with every insert.  then replace blanks
 s2.SetChar(1,'\0');                               //Reduce String::LenStr() to zero by setting NULL
 for(int i=0; i<NUMBER; i=i+90)                    //with '8'.  Finally, copy 90 byte chunks from s1
     s2=s2+s1.Mid(i+1,90)+_T("\r\n");              //to s2, appending a CrLf to each line.  then
 s1=s2.Right(4000);                                //display the last 4000 characters in a MessageBox()
 tick = GetTickCount() - tick;                     //and display final tick difference in console
 printf("tick = %u\n",(unsigned)tick);
 printf("s2.LenStr() = %u\n",(unsigned)s2.LenStr());
 MessageBox(0,s1.lpStr(),_T("Here's Your String John!"),MB_OK);
 getchar();

 return 0;
}


Again, both versions of this run almost instantaneously until they hit that bottleneck at the big concatenation. Observing this, I decided to see what would happen if I used strncpy and strcat instead of the String classes to do the concatenations…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include "Strings.h"
#define NUMBER 2000000      //Ansi 390 ticks!

int main()
{
 DWORD tick=0;
 String s2;

 tick = GetTickCount();
 String s1(_T(' '),NUMBER);
 for(int i=1; i<=NUMBER; i++)
     if(i%7==0) s1.SetChar(i,_T('P'));
 s2=s1.Replace((TCHAR*)_T("P"),(TCHAR*)_T("PU"));
 s1=s2.Replace((TCHAR*)_T(" "),(TCHAR*)_T("8"));
 s2.Make(_T('\0'),2200000);
 TCHAR* pBuf1=s1.lpStr();
 TCHAR* pBuf2=s2.lpStr();
 for(int i=0; i<NUMBER; i=i+90)
 {
     _tcsncpy(pBuf2,pBuf1,90);
     _tcscat(pBuf2,(TCHAR*)_T("\r\n"));
     pBuf2=pBuf2+92;
     pBuf1=pBuf1+90;
 }
 s1=s2.Right(4000);
 tick = GetTickCount()-tick;
 printf("tick = %u\n",(unsigned)tick);
 MessageBox(NULL,s1.lpStr(),_T("Here's Your String John!"),MB_OK);
 getchar();

 return 0;
}


…and I’m coming up with 390 ticks! I’ll do the math for you. That’s 323 times faster! And you don’t even want to know what PowerBASIC’s tick count on this is. So that’s why I’m asking if anyone would know of a faster way to get this done in C++. Perhaps a better String Class than either mine or the std::string class, or maybe a better algorithm than what I have?
Topic archived. No new replies allowed.