How to convert string with double high/wide characters to normal string [VC++6]

My application typically recieves a string in the following format:
" Item $5.69 "

Some contants I always expect:
- the LENGHT always 20 characters
- the start index of the text always [5]
- and most importantly the index of the DECIMAL for the price always [14]
In order to identify this string correctly I validate all the expected contants listed above ....

Some of my clients have now started sending the string with Doube-High / Double-Wide values (pair of characters which represent a single readable character) similar to the following:
" Item $x80x90.x81x91x82x92 "

For testing I simply scan the string character-by-character, compare char[i] and char[i+1] and replace these pairs with their corresponding single character when a match is found (works fine) as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for (int i=0; i < sData.length(); i++)
{
   char ch = sData[i] & 0xFF;
   char ch2 = sData[i+1] & 0xFF;

   if (ch == '\x80' && ch2 == '\x90')
      zData.replace("\x80\x90", "0");
   else if (ch == '\x81' && ch2 == '\x91')
      zData.replace("\x81\x91", "1");
   else if (ch == '\x82' && ch2 == '\x92')
      zData.replace("\x82\x92", "2");
   ...
   ...
   ...
}


But the result is something like this:
" Item $5.69 "
Notice how this no longer matches my expectation: the lenght is now 17 (instead of 20) due to the 3 conversions and the decimal is now at index 13 (instead of 14) due to the conversion of the "5" before the decimal point.


Ideally I would like to convert the string to a normal readable format keeping the constants (length, index of text, index of decimal) at the same place (so the rest of my application is re-usable) ... or any other suggestion (I'm pretty much stuck with this)... Is there a STANDARD way of dealing with these type of characters?

Any help would be greatly appreciated, I've been stuck on this for a while now ...
Thanks,
There's a few things I don't understand:
1. " Item $5.69 " doesn't match any of the conditions you mention. For one, its length is 12, not 20.
2. Why would you write something so easy to break? If someone so much as forgets a space, your code completely fails to parse the string, even if it is otherwise correct.
3. I really can't see how clients sending data in a different encoding breaks anything.
1
2
input=convert_input(string);
parse_input(input);
1. Odd, when I copy/paste it in the forum it automatically trims the spaces in front and back ... it should be:
" Item $5.69 "

2. We could get a LOT of different data and we need to ensure we regonize only these specific lines, so if the decimal is at a different location we are NOT supposed to use it ... not perfect but the way the application has always worked, and until now done a fairly good job in the field

3. Not sure where convert_input and parse_input functions are coming from...
1. Ok, still not posted properly, let me use underscore for spaces to show:
"_____Item___$5.69___"
so if the decimal is at a different location we are NOT supposed to use it
In that case, the program should reject text in a different encoding. Otherwise, the policy is inconsistent. You're being permissive with the encoding and strict with the formatting. If you're strict with the formatting, be strict with the encoding; if you're permissive with the encoding, be permissive with the formatting.
If the clients are sending wrong input, that's their problem.
Last edited on
100% agree - sadly the customer is always right ... they changed it at their end to provide this "double high/double wide" look to the text on the screen, and that is the data I intercept. We've spoken to them about it and were told to figure it out ...

Not like I can tell my manager to loose the business because of it ... sadly I need to find some kind of solution and I would like for it to minimally impact the rest of our pre-exisiting software - so reformatting it at the source to something matching what we expect is best ....
I really don't see any solution, here. You're telling me the code is supposed to reject bad input, that the client is sending bad input, and that you're supposed to accept that input.
Either you write a version with relaxed controls specific for that client or the client fixes his input (after all, if it's just a presentation feature, they could fix it in the back-end before sending it). I can't see any way out of this.

Wait, why am I doing your job? You should be able to figure this out on your own.
Not asking you to do my job, was looking for some help & advice because I am having an issue figuring it out ... not sure where you got that impression ...

As for your idea - you are misunderstanding something ... the new input is not bad it is in a different encoding ... there is a big difference .
If it was just in a different encoding, then converting it shouldn't change the formatting. Either the input came wrong from the client, or you're messing up the conversion:
(I'll use Z for 0x80, A for 0x90, B for 0x91 etc. to keep things aligned.)

What should be coming from the client:
"_____Item___$5.69___" <- Good string
"_____Item___$ZE.ZFZJ___" <- Input (client breaks formatting on purpose)
"_____Item___$5.69___" <- After conversion (formatting is restored after conversion because spaces are left untouched)


What, based on what you said, appears to be really coming:
"_____Item___$5.69___" <- Good string
"_____Item__$ZE.ZFZJ_" <- Input (client tries to keep formatting)
"_____Item__$5.69_" <- After conversion (spaces are left untouched and formatting breaks)


EDIT: I suppose you could insert spaces as various places to fix it.
Last edited on
Ya - inserting spaces what the sad option I was considering before posting in the forum, looking for a better alternative (because, and I agree with you, not a very reliable approach). Not sure what else to do ....

Thanks for the help
Topic archived. No new replies allowed.