Clean a polluted file. Unexpected results

Apr 28, 2016 at 3:31pm
Hi.

I have a text file that have been polluted with strange characters like Æ’à and codes like '\0' '\x1' '\x19' by another application.

I've built an app that is supposed to clean those files but I get unexpected results.

Here are the part of my code that does that

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  const string Stringbase = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~'";

string ReplaceAscii(string x)
{
	string tmp ;
	string Char;

	for (unsigned int i = 0; i <= x.length()-1; i++)
	{
		Char = x.at(i);
				
		if (Stringbase.find(Char) != string::npos)
		{
			tmp += Char;
		}
	}
	return tmp;
}


That function receive a complete line of the actual file to be cleaned then is supposed to return a cleaned string.

The problem is that if I read the original file, I get strange results like char ™ is treated as "= and those chars are copied to the return string. But if I cut and paste the entire content in a new text file, everything is working properly, the ™ char is being removed from the string.

I'm working with VS2015 but I tried with Geany with same results.
Apr 28, 2016 at 4:19pm
You may want to try working with the character's numeric values instead of the representation.

Perhaps something like:
1
2
3
for( auto it : x)
   if(static_cast<int>(it) > 31 && static_cast<int>(x) < 127)
      tmp += it;
Apr 29, 2016 at 2:43pm
Nope. Same behavior. This must have something to do with file encoding but I can't pinpoint the problem.
Topic archived. No new replies allowed.