Why is a trailing punctuation mark rendered at the start with direction:rtl?

This is more a sort of curiosity. While working on a multilingual web application I noticed that certain characters like punctuation marks (!?.;,) at the end of a block element are rendered as if they were placed at the beginning instead when the writing direction is right-to-left (as it is the case for certain Asian languages I do not speak).

In other words, The string

Hello, World!

is rendered as

!Hello, World

when placed in a div block with direction: rtl

This becomes even more evident if the text is split in two parts and given different colors: a contiguous chunk of text at the end is rendered in two separated regions:

http://jsfiddle.net/22Qk9/

What's the point of this behavior? I guess this must be a peculiarity of (all?) right-to-left languages which is automatically handled by the browser, so I don't need to care about it, or should I?


Solution 1:

If you want to fix this behavior add the LRM character ‎ in the end. It's a non=printing character.

Source : http://dotancohen.com/howto/rtl_right_to_left.html

Example : http://jsfiddle.net/yobjj6ed/

Solution 2:

The reason is that the exclamation mark “!” has the BiDi class O.N. ('Other Neutrals'), which means effectively that it adapts to the directionality of the surrounding text. In the example case, it is therefore placed to the left of the text before it. This is quite correct for languages written right to left: the terminating punctuation mark appears at the end, i.e. on the left.

Normally, you use the CSS code direction: rtl or, preferably, the HTML attribute dir=rtl for texts in a language that is written right to left, and only for them. For them, this behavior is a solution, not a problem.

If you instead use direction: rtl or dir=rtl just for special effects, like making table columns laid out right to left, then you need to consider the implications. For example, in the table case, you would need to set direction to ltr for each cell of the table (unless you want them to be rendered as primarily right to left text).

If you have, say, an English sentence quoted inside a block of Arabic text, then you need to set the directionality of an element containing the English text to ltr, e.g.

<blockquote dir=ltr>Hello, World!</blockquote>

A similar case (just with Arabic inside English text) is discussed as use case 6 in the W3C document What you need to know about the bidi algorithm and inline markup (which has a few oddities, though, like using cite markup for quoted text, against W3C recommendations).

Solution 3:

The accepted answer https://stackoverflow.com/a/20799360/477420 works if you can control markup/CSS of the value, if you have no control over HTML following approach could work.

If you don't know if page will be rendered RTL or LTR but some text is definitely LTR (i.e. English-only) you can wrap the value with LRE/PDF marks to signify that is LTR region. Text will be rendered LTR irrespective of page's LTR or RTL direction.

This works when you have some code that tries to render text without ability to change markup of how exactly it will show up on the page. I.e. you rendering value for "song tile" or "company name" field in some nested child component (or server side) without ability to control surrounding HTML elements.

One drawback of this and similar approaches (like LRM proposal in this question) with adding marks to text is copy-paste of such value from the resulting HTML page will generally preserve the marks but they are not visible/zero width. While for most cases it is fine consider if that is a problem for you.

Approximate sample code (some companies have "Inc." at the end which will end up with dot at the beginning when rendered as-is on RTL page):

 // comanyName = "Alphabet Inc." - really likes dot at the end including RTL
 if(stringIsDefinitelyAscii(companyName))
 {
     companyName = "\u202A" + companyName + "\u202C"
 }
 return companyName;

Details on LRE/PDF symbols can be found in https://unicode.org/reports/tr9/#Explicit_Directional_Embeddings:

LRE U+202A LEFT-TO-RIGHT EMBEDDING Treat the following text as embedded left-to-right.

PDF U+202C POP DIRECTIONAL FORMATTING End the scope of the last LRE, RLE, RLO, or LRO.

Some approaches to figure out if string has RTL characters can be found in How to detect whether a character belongs to a Right To Left language?, JavaScript: how to check if character is RTL?, How to detect if a string contains any Right-to-Left character?.

Solution 4:

This is just a speculation why this would happen.

My guess is that the direction: rtl property adds a "bidirectionality" phenomenon, where it also affects the punctuation. This is used for Arabic or Hebrew scripts where the related punctuation is moved to the beginning of the line.

source: http://www.w3.org/TR/2013/WD-css-writing-modes-3-20131126/#text-direction

But why is the word at the end?

My guess is that this unicode is not seen as one of the supported languages that takes affect.

jsFiddle

As you can see the Arabic text does take effect


So it is because it is originally meant for Arabic, Hebrew or any other "mixed-language", where it did only see the last punctuation as one of the supported UNICODES where as the word itself wasn't one of the supported language UNICODE.