Why does certain characters draw horribly, horribly wrong in Windows?

Odd characters:

ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้

Question: Why does these characters draw so odd when you look at them in Windows*?

Here's a snippet from Outlook for you lucky bastards who aren't forced to use Windows:

Stuff

Related: What's the character encoding used?

*Windows as in the OS. Applications drawing text using GTK+ and the likes doesn't show these like something out of an LSD-trip gone wrong.


As I seem to have taken a bit of a hit for thinking that Windows is "doing it right" I feel I should post an answer to justify my position.

The fact of the matter is that the text you have is telling the operating system to render an insane amount of combining characters. The problem that one operating system actually renders them while another does not is due to a number of issues. One of these issues is how thorough the programmers were when they wrote the code to render them, another would be due to programmers being too lazy to implement them properly, if at all.

Basically it comes down to the idea that written languages are fluid things, and that many characters in certain languages have multiple different diacritics that get applied to modify the pronunciation of characters. How do we handle all these diacritics, do we give each letter with a diacritic a new character (which would result in one heck of a lot of new and nearly identical characters) or do we create a set of characters specifically for diacritics and reduce our overall alphabet?

Unicode gives us the scope to do both, but in doing so the programmers who have to render these diacritics have to deal with the fact that there are some characters that actually do have multiple diacritics, one above and one below, and then the programmer has to ask the question of just when do they stop. They could limit it to two, and satisfy most people but ignore those who want or need three diacritics in order to write formally in their own language.

Microsoft, whether rightly or wrongly, decided to let the user decide just how many extra marks a person wants to use. This path takes a rather good programmer and some tough rationalisation to follow through. I fully support them in both allowing this and the fact that they do it well.

If those characters were missing on the other hand, I would want to know just why they were missing. Was it a "we drop this on the floor after x diacritics" decision, or was it due to the programmers being too lazy to do it properly and potentially exposing me to some buffer overflow with code hidden in the diacritics getting passed out to be executed by the system?

The simple issue here is that by actually rendering those characters I can see that the system is doing exactly what it is told to do, rather than doing what it thinks is right or, worse, doing something potentially harmful.


Why does these characters draw so odd when you look at them in Windows*?

Because Windows attempts to render large numbers of Unicode combining characters when your text contains them, even though no actual script would ever combine that many marks together.