Do most languages need more space than English?

I saw the following statement on User Experience:

Supporting multiple languages can break the user interface, because most languages need more space than english

This seems to be a gross generalisation. Does anyone have any data on this?

Speaking as a translator, I can share a few rules of thumb that are popular in our profession:

Hebrew texts are usually shorter than their English equivalents by approximately 1/3. To a large extent, that can be attributed to cheating, what with no vowels and all.
Spanish, Portuguese and French (I guess we can just settle on Romance) texts are longer than their English counterparts by about 1/5 to 1/4.
Scandinavian languages are pretty much on par with English. Swedish is a tiny bit more compact.
Whether or not Russian (and by extension, Ukrainian and Belorussian) is more compact than English is subject to heated debate, and if you ask five people, you'll be presented with six different opinions. However, everybody seems to agree that the difference is just a couple percent, be it this way or the other.

Now that's for complete texts, on average, as a rule of thumb. Obviously, when you are working on a GUI, you mostly have to deal with translating individual words, which changes the picture dramatically. I am not aware of any universal research on the subject, but I will go out on a limb and say that it would be worthless to you, precisely because of being universal.

First of all, let's have a look at English itself. A very popular estimate for the average length of English words is 5 letters (or 5.2, or 5.3, or 5.1). I will not expressly address the validity of that estimate here, though I will link to this tiny bit of intriguing research (executive summary: "the larger the dictionary, the longer the words that are contained in it"). Much rather, I will focus on saying that your mileage will always vary.

It all depends on what application you are writing, and for what target audience. You might be writing a text editor for children, a web browser for everyone, or a worst-case execution time analyzer for the aerospace industry. Sometimes, your menu entries will read "Open", "Edit", "Save" and "Quit". Other times, they will read "Crossing reduction" and "Simulated annealing". Add into the equation that "Quit" is not necessarily short in all languages, and "simulated annealing" is not necessarily long, and you've got yourself a complete mess, no matter what the universal research says.

Secondly, there is something to be said about the units in which one measures the average word/text length. Traditional research and urban legends alike focus on the number of characters. But for a GUI designer, that kind of information is rather useless, because he measures the screen real estate in pixels.

As a simple example, in terms of letters, "猫" is 66% shorter than "cat" (which is what it means) and 75% shorter than "neko" (which is its Kun reading). But in terms of pixels, you don't save anywhere as much space. So, whether or not your menu items in Japanese, Chinese, Arabic, Farsi or Urdu will end up being shorter than their English counterparts depends on how you define "shorter".

The tricky part is that to one extent or another, this is true for every pair of languages, even for those that use the same alphabet. You have the English word "illicitly", and you translate it into Phantasese, and you get "mamwowo". Now what? It's two letters shorter, yet it no longer fits. (Unless, of course, you are using monospaced fonts everywhere, which is highly unlikely.)

Lastly, I would like to specifically address the myth that German words are oh-so-long. All those awfully long German words are only that long because they correspond to many words in other languages. "Kontrollflußgraphvisualisierungssoftware" is no longer than its English counterpart, "control flow graph visualization software", and the famous "Donaudampfschiffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft" is considerably shorter than its English translation. Yep, you heard that right, that monster of a word actually saves space. German words can be long and succinct at the same time, and English recognizes that by borrowing (kindergarten, wunderkind, doppelganger, wanderlust, zeitgeist, schadenfreude...).

Edit: I want to add that when it comes to GUIs (or news headlines), English loves to cheat by dropping articles. "Export file" rather than "export the file", "import image" rather than "import an image", and so forth. Many languages can't do that because they don't have articles to begin with. If there's any advantage Russian does have over English in normal prose, it's not having "a"s, "an"s, "the"s (and "to"s, while we're at it) scattered all over the place. But when it comes to GUIs, Russian loses that advantage, and an English expression that was longer than its Russian equivalent might suddenly become shorter. German is even better at that game: it has lots of articles to drop, none of them shorter than 3 letters, and quite a few that are 4 or 5 letters long.

A point of reference from the website I maintain. The files where we store the translations have the following sizes:

English: 200k
Portuguese: 208k
Spanish: 209k
German: 219k

And the translations are out of date. That is, there are strings in the English file that aren't yet in the other files.

For Chinese, the situation is a bit different because the character encoding comes into play. Chinese text will have shorter strings, because most words are one or two characters, but each character takes 3–4 bytes (for UTF-8 encoding), so each word is 3–12 bytes long on average. So visually the text takes less space but in terms of the information exchanged it uses more space. This Language Log post suggests that if you account for the encoding and remove redundancy in the data using compression you find that English is slightly more efficient than Chinese.

This is of course a big generalisation. I would say it differently: Supporting multiple languages can break the user interface, because for almost any language there will be a string that needs more space than English.

What I mean is, "the average world length" may be close to English, but some particular words/expressions might be surprisingly long when translated, and these are the strings which might get cut in GUI. There might be a few such strings in the whole project, but they will annoy users a lot.

So it's not about languages needing more space, it's about particular translated strings that cannot fit. See "Hello/Здравствуйте" example above.

Just translate the terms 'Play Now", 'Instant Play', 'Visit Site' or even 'Click Here' - all common call to actions on the web, and see the results. some are much shorter and some are much much longer.

I am responsible for a large portfolio of multi language sites with some carrying 29 languages and although mostly the design side runs smoothly, there are times where stuff just simply doesn't fit.

I think one of the major problems is that we are always direct translating english to all the other languages, when in fact if we reworked the messages in our creative to properly fit each market, perhaps things would be a little different - 29 times the work, but perhaps a better level of conversion.

Semitic languages in common writing don't have written vowels, so even if the words were of the same length as in English, they would take less characters. Hebrew, for example, uses letters that are not more complicated, visually, than Latin letters, and the words look shorter because they miss the vowels. For example ירושלים contains 7 letters and is read as ye-ru-sha-la-yim - 11 sounds.

Now, when the vowels are written, as in children books, the words are still shorter visually because the vowels are dots above or below the actual letters. So the previous example will look like יְרוּשָלַיִם

From my experience, in GUI, Semitic texts are shorter than English.

This answer looks not very related to English. Hope it is interesting anyway :)

Do most languages need more space than English?

Related

Recent Posts