Hey There,
I am actually facing a problem here guys. In an array of strings I want to remove all the punctuation and also convert all the letters to lower case letters. Does anybody have a clue how this could be done??
I really appreciate it if u helped.
Many Thanks.
Little function I made. This takes advantage of ASCII. Every ASCII character is represented by a number. In the same sense I could have had it compare to the characters themselves but I don't like that. Just opinion.
//for a C string
std::transform(str,str+strlen(str),str,tolower);
//for an std::string
std::transform(str.begin(),str.end(),str.begin(),tolower);
It's true that no particular encoding should assumed, but it's also true that program's correct functioning shouldn't depend on the locale settings of the system (IMO), and it should just use UCS (Unicode is a subset of UCS). Since UCS includes ASCII, computerquip's is actually more portable by not making the program's behavior depend on settings.
By the way, I personally like this version better:
1 2 3 4 5 6
template<typename T>
void tolower(T *str){
for (;*str;str++)//to change to toupper():
if (*str>='A' && *str<='Z') //invert the case of these literals,
*param|=32; //and change this to &=255^32
}
Shouldn't correct functioning follow the locale settings? That was my assumption! Ultimately it depends on the problem specification. Only the OP knows for sure (maybe).
Well, that's why I put the IMO in there. If you really want to make the program portable, don't assume the system is in any particular locale and just use Unicode.
This is a mistake Japanese programmers make on a daily basis. In Japan, OSs are set up to use Shift JIS as the default non-Unicode encoding. These programmers think "hell. If everyone uses Shift JIS, what's the point of using Unicode? No one will ever want to use this program outside of Japan!"
Whenever a programmer thinks something that starts with "no one will ever", disasters occur.
Always use Unicode. When using only 8 bits, use ISO-8859-1, which is a small subset of Unicode, or UTF-8, depending on your requirements.
I still don't understand how assuming only 'A' through 'Z' are "uppercase" is portable. Isn't that assuming a particular locale? We seem to be talking at cross purposes here.
'A' is a numeric literal translated by the compiler at compile time. In other words, its value depends on the compiler.
My example could be rewritten replacing 'A' and 'Z' with 65 and 90 and it would produce the same results.
'A' is a numeric literal translated by the compiler at compile time.
Well duh! ;) That's not the point at all. The point is that if you use isupper and tolower it will work in all locales, which makes it more portable, IMAO. Like I said, we seem to be talking at cross purposes, about different kinds of portability. It ultimately depends on the problem specs and cannot be dictated beforehand.