Strings

Hey There,
I am actually facing a problem here guys. In an array of strings I want to remove all the punctuation and also convert all the letters to lower case letters. Does anybody have a clue how this could be done??
I really appreciate it if u helped.
Many Thanks.

closed account (S6k9GNh0)

char * toLower(char * word)
{
    int i;

    while (word[i] < 97 && word[i] >= 65)
    {
        word[i] += 32;
        ++i;
    }
    return word;
}

Little function I made. This takes advantage of ASCII. Every ASCII character is represented by a number. In the same sense I could have had it compare to the characters themselves but I don't like that. Just opinion.

Last edited on

Hammurabi (399)

Generally you should avoid a particular encoding. toLower is more portable and more readable like this:

char* toLower( char* str )
{
    for( ; *str; str++ )
        *str = tolower( *str );
    return str;
}

The function ispunct() will tell you if a char is punctuation.

Last edited on

helios (17574)

//for a C string
std::transform(str,str+strlen(str),str,tolower);
//for an std::string
std::transform(str.begin(),str.end(),str.begin(),tolower);

It's true that no particular encoding should assumed, but it's also true that program's correct functioning shouldn't depend on the locale settings of the system (IMO), and it should just use UCS (Unicode is a subset of UCS). Since UCS includes ASCII, computerquip's is actually more portable by not making the program's behavior depend on settings.
By the way, I personally like this version better:

template<typename T>
void tolower(T *str){
    for (;*str;str++)//to change to toupper():
        if (*str>='A' && *str<='Z') //invert the case of these literals,
            *param|=32; //and change this to &=255^32
}

Last edited on

upo (14)

Yes, thank you guys that was very helpful
I really appreciate your help ;)

Hammurabi (399)

Shouldn't correct functioning follow the locale settings? That was my assumption! Ultimately it depends on the problem specification. Only the OP knows for sure (maybe).

helios (17574)

Well, that's why I put the IMO in there. If you really want to make the program portable, don't assume the system is in any particular locale and just use Unicode.
This is a mistake Japanese programmers make on a daily basis. In Japan, OSs are set up to use Shift JIS as the default non-Unicode encoding. These programmers think "hell. If everyone uses Shift JIS, what's the point of using Unicode? No one will ever want to use this program outside of Japan!"
Whenever a programmer thinks something that starts with "no one will ever", disasters occur.

Always use Unicode. When using only 8 bits, use ISO-8859-1, which is a small subset of Unicode, or UTF-8, depending on your requirements.

Hammurabi (399)

I still don't understand how assuming only 'A' through 'Z' are "uppercase" is portable. Isn't that assuming a particular locale? We seem to be talking at cross purposes here.

helios (17574)

'A' is a numeric literal translated by the compiler at compile time. In other words, its value depends on the compiler.
My example could be rewritten replacing 'A' and 'Z' with 65 and 90 and it would produce the same results.

closed account (S6k9GNh0)

Look at my example for use of the numerical version.

Hammurabi (399)

'A' is a numeric literal translated by the compiler at compile time.

Well duh! ;) That's not the point at all. The point is that if you use isupper and tolower it will work in all locales, which makes it more portable, IMAO. Like I said, we seem to be talking at cross purposes, about different kinds of portability. It ultimately depends on the problem specs and cannot be dictated beforehand.

Topic archived. No new replies allowed.