Why is rendering multibyte character sequences unbelievably slow?

About a week ago I realized that the file list in µTorrent would hang for less than a second whenever a file with a long Japanese file name was visible. I found it curious, but I didn't really have time to worry about it at the time, especially since it was only limited to µTorrent.

However, today I realized that it is not. If I for example save a text file with a long multibyte character file name and open it in Notepad, I get some strange results. When I try to resize the window, everything slows to a crawl. I can however release my grip on the window and see how my cursor splits in two, one being controlled by me and the other being a sort of "ghost cursor" for lack of a better word that executes the dragging motion I originally made with the mouse. This only applies to filenames of this nature, and I have tested it in applications other than Notepad and µTorrent as well.

I've tried searching for clues as to what is causing this strange behavior, but I cannot find anything. Does anyone here have any idea what's going on?

Unfortunately, I cannot take a screenshot of this as it seems like all screenshot applications hang until the resizing is complete before taking the shot...

Edit: I've recorded a video demonstrating the problem. I'm not sure whether this will help in identifying the cause but it should at least be better than my explanation above:

https://vimeo.com/58619918

Edit 2: Here's a sample file as requested: Note that it's simply an empty file with a long multibyte filename: http://goo.gl/bgnGP (And for those of you with a browser which can't handle the filename, here's a zip-file: https://dl.dropbox.com/u/55495248/multibyte.zip)


Solution 1:

I can explain how Unicode is being handled, but I cannot really directly answer your question. I have had slowness for the first write, but once that is done, it gets fast again...

Unicode is composed of what we call planes. Planes are 256 characters. In many situations, fonts will handle one plane, in part to avoid very large files but also because it is enough for many languages (English, French, German...). However, Asian languages make use of larger fonts that cover multiple planes. For a complete Japanese character set you'd get, if I'm correct, about 10 planes. Chinese is more (especially traditional Chinese!)

When rendering with such fonts, you have to select the corresponding font (if one font is not enough to handle all the characters, the operating system switches between fonts for you; that's under the hood, but it happens.) That is time consuming. Plus, the first time the system writes in that font, it needs to load it from disk. Asian languages having large fonts, that takes time too.

Finally, and that is probably more likely what you are encountering, the characters (or glyphs) are generally more complex. That means more time to render the characters. Although that could be done by the video board with OpenGL/D3D, for fonts, that is not so good. You lose a lot of quality (although font quality under MS-Windows...) So it is most often done by the processor.

One last note, although I would really doubt that is a concern, by default Win7 makes the window edges semi-transparent. It could be that adds to the problem. This part of the rendering, however, is most certainly done with accelerated 2D/3D functions on your video board.