Finding the 'firsy byte of the last character from a UTF-8 string'

Hi,
I don't have a clue about programming (I'm a school teacher), it's just a friend bet that I couldn't resolve this algorithm problem with the internet. Let's prove him wrong.
Also, bonus is the following question (he is scandinavian):
"Why is "å" (U+00E5 in unicode) encoded as C3A5 in hex in UTF-8?"

Thanks guys,
Challenge accepted, I guess :=)
Does your friend not know about wikipedia?

http://en.wikipedia.org/wiki/Utf-8#Design


A brief examination of the UTF-8 format will show that secondary bytes of a codepoint all have bit 7 set and bit 6 clear. Therefore to see if a byte is the first in a sequence or not, it's very simple:

1
2
3
4
5
6
7
8
if( (byte & 0xC0) == 0x80 )
{
  // not the first byte in a sequence
}
else
{
  // first byte
}
Of course, he probably knows the answer, too. He just wanted to challenge me to get an answer for something in a topic I know anything about, because he always tells me coders on the internet will never help you.

So thats the answer I should tell them for 'first byte of the last character from a UTF-8 string'?

Thanks!!!
Topic archived. No new replies allowed.