C++ cleaning string problem

I'm having trouble with cleaning a string from nonchars correctly. When the string contians for example "test&set" the function deletes the "&" and sticks both words together (testset). I'm using getline() with a space as delimiter. This is my code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void linklist::cleanword()
{
    string word;

    cout << "File contents:\n" << endl;
    while (!infile.eof())
        {
            getline(infile, word, (' '));
            cout << word << endl;
            for (int i=0; i<word.length(); i++)
            {
                word[i]=tolower(word[i]);
            }

            for(string::size_type i=0; i < word.size(); ++i) /*just trying data type size_type, used for unlimited string size*/
            {
                if(ispunctuation(word[i]))
                {
                    word.erase(i,1);
                    i--;
                }
            }
            newword = new char [word.size()+1];
            strcpy (newword, word.c_str());
            append(newword); //adding word to my linked list.
        }
}


Any ideas? Thx!
strcpy doesn't work with strings, only with character arrays
When the string contians for example "test&set" the function deletes the "&" and sticks both words together (testset).
Well, yes. You're telling it to do that.
1
2
3
4
5
if(ispunctuation(word[i]))
{
    word.erase(i,1);
    i--;
}


What's the declaration of append()?
Thx for the reply!

strcpy doesn't work with strings, only with character arrays


But that's where c_str() comes in:

http://www.cplusplus.com/reference/string/string/c_str/

So that's not the problem.
I append the word to the linked list:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void linklist::append(char *txt)
{
     node *q,*t;

   if( p == NULL )
   {
        p = new node;
      p->data = txt;
      p->link = NULL;
   }
   else
   {
        q = p;
      while( q->link != NULL )
           q = q->link;

      t = new node;
      t->data = txt;
      t->link = NULL;
      q->link = t;
   }
}


And yeah your right, basicly I don't know how to split a string into words. I first used the space delimiter to solve it but that just doesn't cut it for semi weird strings...
You might have a problem if the string starts with a 'punctuation' character, like "&Eat Quiche".

There is nothing else wrong with your syntax that I can see.
I presume that append() takes a char* to something that is delete[]ed later. (Otherwise you'd have a memory leak.)

I'm not sure your program is well-organized though... but that is just a gut-wrench reaction to cleanword() doing a bunch of file I/O with externally-parameterized streams.

What exactly is not working for you?

[edit] Hmm, take a second to help one's sick wife and miss the whole conversation ...
Last edited on
Actually "&Eat Quiche" is cleaned just fine, "Eat&Quiche" is where the problems start. The string "word" is not actually a word, but the string it takes from getline() until a space is introduced.

And yes this function is still a bit messy :P The rest is not that bad though.
The potential problem with "&Eat Quiche" is integer underflow. On PCs and the like that shouldn't be a problem.

I still don't understand what the problems are with "Eat&Quiche". What is your desired output? Do you want two words out of that?
First look for the first occurrence of a "punctuation" character as you define it, then cut the string using substr() (http://www.cplusplus.com/reference/string/string/substr/ ). To look for occurrences, you can use find() (http://www.cplusplus.com/reference/string/string/find/ ) find_first_of() (http://www.cplusplus.com/reference/string/string/find_first_of/ ), or just iterate through the string using the [] operator.
Yep that's the problem, the program parses words from a txt file and the output is a csv file. It does that just fine, but two words like
test&set
without spaces gives
testset
. It should be
test
and
set
.
Ah, you will need a loop to find each occurrance of punctuation and extract the word before it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
    // So long as there are words left in our string...
    while (!word.empty())
    {
        // Find the first occurrance of any punctuation in the word, if any
        string::size_type n = 0;
        while ((n < word.length()) && !ispunctuation( word[ n ] )) ++n;

        // If we found a non-empty string, add it to our list
        if (n > 0)
        {
            newword = new char[ n ];
            strcpy( newword, word.substr( 0, n ).c_str() );
            append( newword );
        }

        // Remove what we found from the word
        word.erase( 0, n + 1 );
    }

Good luck!
Thank you all for the great help. The function works perfectly now with the use of substr(). If anyone wants the whole program, no prob :).
Alas it was too early to celebrate... After nesting Duoas function in my own function, for an unknow reason, it sometimes just stops in the middle of the function. Not with all text, all of my own typed test files work while copy'd text fail all the time... What am I doing wrong?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
void linklist::cleanword()
{
    string word;

    cout << "File contents:\n" << endl;
    /*while file is not end of file*/
    while (!infile.eof())
        {
            getline(infile, word, (' '));
            cout << word << endl;
            for (string::size_type i=0; i<word.length(); i++)
            {
                word[i]=tolower(word[i]);
                cout << word << endl;
            }
            /*while there is a string in word*/
            while (!word.empty())
            {
                /*look for punctuation*/
                string::size_type n = 0;
                while ((n < word.length()) && !ispunctuation( word[n] )) ++n;

                /*if there is a word, append*/
                if (n > 0)
                {
                    newword = new char[n];
                    strcpy(newword, word.substr( 0, n ).c_str());
                    append(newword);
                }

                /*erase findings*/
                word.erase(0, n + 1);
            }
        }
}
Last edited on
Appareantly it has trouble with the getline() 3rd parameter, wich is a space. If I don't use that parameter getline() will automaticly use '\n' as a delimiter. So I still get the malfunction but stops at a different line...
Topic archived. No new replies allowed.