Finding the 'firsy byte of the last char

Forum

Forum
General C++ Programming
Finding the 'firsy byte of the last char

Finding the 'firsy byte of the last character from a UTF-8 string'

Hi,
I don't have a clue about programming (I'm a school teacher), it's just a friend bet that I couldn't resolve this algorithm problem with the internet. Let's prove him wrong.
Also, bonus is the following question (he is scandinavian):
"Why is "å" (U+00E5 in unicode) encoded as C3A5 in hex in UTF-8?"

Thanks guys,
Challenge accepted, I guess :=)

Disch (13742)

Does your friend not know about wikipedia?

http://en.wikipedia.org/wiki/Utf-8#Design

A brief examination of the UTF-8 format will show that secondary bytes of a codepoint all have bit 7 set and bit 6 clear. Therefore to see if a byte is the first in a sequence or not, it's very simple:

if( (byte & 0xC0) == 0x80 )
{
  // not the first byte in a sequence
}
else
{
  // first byte
}

ljl16 (2)

Of course, he probably knows the answer, too. He just wanted to challenge me to get an answer for something in a topic I know anything about, because he always tells me coders on the internet will never help you.

So thats the answer I should tell them for 'first byte of the last character from a UTF-8 string'?

Thanks!!!

Topic archived. No new replies allowed.

C++

Forum

Finding the 'firsy byte of the last character from a UTF-8 string'