In mongolian cyrillic, one character has 2-byte size.
'а' - 2 byte,
'б' - 2 byte,
'в' - 2 byte,
'г' - 2 byte,
'д' - 2 byte and so on.
In mongolian traditional script one character has 3-byte size.
'ᠠ' - 3 byte
'ᠳ' - 3 byte
'ᠭ' - 3 byte
'ᠳ' - 3 byte and so on.
English letters and numbers and special character have 1- byte size.
So the size of following is 8 byte.
'й' - 2 byte, 'а' - 2 byte, 'ᠭ' - 3 byte, '1' - 1 byte.
I need to separate this word letter by letter like "йаᠭ1"=> 'й', 'а', 'ᠭ', '1'.
So if I cut first 2 bytes of this word, the letter is ᠌᠌᠌᠌"й".
1 2 3 4
|
string s1=s.substr(0,2);//й
string s2=s.substr(3,2);//а
string s3=s.substr(4,3);//ᠭ
string s4=s.substr(8, 1);//1
|
The problem is that how I separate any word like this word, made up multi-languages character letter by letter. How do i know that I should cut first 2, or,3 or 1 byte of word?