How do I find out which language some Unicode characters belong to?
On Facebook, there are currently messages floating around with these strange chars:
ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้
They are used to confuse the reader, because they break out of the designated text areas.
Do they really belong to a language? If so, which one?
Solution 1:
They are Thai characters with long strings of combining diacritic marks after base characters. You do quite similar things with Latin letters, too, e.g. â̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂̂ (which is a with circumflex followed by several combining circumflexes). Or you could use a sequence of combining horns: ư̛̛̛̛̛̛̛̛̛̛̛̛̛̛̛̛̛̛ or cedillas: ç̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧̧. It naturally depends on rendering software how the contrived sequences will be displayed.
Solution 2:
If you're on Linux, you can try the Perl script utfinfo.pl (see also Program to check/look up UTF-8/Unicode characters in string on command line?); the output I get is:
$ echo ก็็็็็็็็็็ กิิิิิิิิิิ ก้้้้้้้้้้ | perl utfinfo.pl
Got 64 uchars
Char: 'ก' u: 3585 [0x0E01] b: 224,184,129 [0xE0,0xB8,0x81] n: THAI CHARACTER KO KAI [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: '็' u: 3655 [0x0E47] b: 224,185,135 [0xE0,0xB9,0x87] n: THAI CHARACTER MAITAIKHU [Thai]
Char: ' ' u: 32 [0x0020] b: 32 [0x20] n: SPACE [Basic Latin]
Char: 'ก' u: 3585 [0x0E01] b: 224,184,129 [0xE0,0xB8,0x81] n: THAI CHARACTER KO KAI [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: 'ิ' u: 3636 [0x0E34] b: 224,184,180 [0xE0,0xB8,0xB4] n: THAI CHARACTER SARA I [Thai]
Char: ' ' u: 32 [0x0020] b: 32 [0x20] n: SPACE [Basic Latin]
Char: 'ก' u: 3585 [0x0E01] b: 224,184,129 [0xE0,0xB8,0x81] n: THAI CHARACTER KO KAI [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]
Char: '้' u: 3657 [0x0E49] b: 224,185,137 [0xE0,0xB9,0x89] n: THAI CHARACTER MAI THO [Thai]