Inserting HTML tag in the middle of Arabic word breaks word connection (cursive)

From wikipedia:

Cursive (from Latin curro, currere, cucurri, cursum, to run, hasten) is any style of handwriting that is designed for writing notes and letters quickly by hand. In the Arabic, Latin, and Cyrillic writing systems, the letters in a word are connected, making a word one single complex stroke.

In the above languages when we want to format one single word with e.g. <span> tag to apply custom css style it breaks word conection, so is there any solution for this.

example this is for example normal arabic word:

كتب

but when we want to color last letter in other color using the span tag get this: enter image description here

because first two letter are in one tag and last is in other to color it.

Is there something I can do to avoid word breaks.

Here is the full html:

<p>كت<span style="color: Red;">ب</span></p>

Solution 1:

I'm not sure if there's any HTML way to do it, but you can fix it by adding a zero-width joiner Unicode character before the opening span tag:

<p>كت&#x200d;<span style="color: Red;">ب</span></p>

You can use the actual Unicode character instead of the HTML character entity, of course, but that wouldn't be visible here. Or you can use the prettier &zwj; entity.

Here it is in action (using an invisible <b> tag, since I can't do color here), without the joiner:

كتب

and with the joiner:

كت‍ب

It's supposed to work without the joiner as far as I understand it, though, and it does in some browsers, but clearly not all of them.

Solution 2:

Update 2020/5

Google Chrome (Checked version 81.0.4044.138) and Firefox (76.0.1) have solved this issue when rendreing Arabic and Farsi words and there is no more need to handle the situation manually. Simply wrap the keyword with <span style="color:red">Keyword</span> works fine with both connecting and non-connecting characters.

Main post:

After 7 years of accepted answer I would like to add a new answer with more practical details as my native language is Farsi. I assume that we want to replace a keyword within a long word. This answer considers the following details:

1- Sometimes it is not enough to add &zwj; only to the previous character becase next character should also has a tail to complete the connection.

body{font-size:36pt;}
span{color:red}
Wrong: مک&zwj;<span>انیک</span>
<br>
Correct: مک&zwj;<span>&zwj;انیک</span>

2- We may also need to add ‍ after the keyword to connect it to next character.

body{font-size:36pt;}
span{color:red}
Wrong: مک&zwj;<span>&zwj;انیک</span>ی
<br>
Correct: مک&zwj;<span>&zwj;انیک&zwj;</span>&zwj;ی

3- There are some characters that accept tail before but not after. So we have to exclude them from accepting tail after them. This is the list of non-connecting characters to next characters: ا آ د ذ ر ز ژ و

4- Finally to respect search engines and scrappers, I recommend using javascript (jquery) to replace keywords after DOM ready to keep the page source clean.

This is my final code with regards to all details above:

$(document).ready(function(){
		
  var tail="\u200D";
  var keyword="ستر";
  
  $(".searchableContent").each(function(){
    var htm=$(this).html();
   
    /*
    preserve keywords which have space both before and after 
    with a temp sign say #fullHolder#
    */
    htm=htm.split(' '+keyword+' ').join(' #fullHolder# ');
    
    /*
    preserve keywords which have only space after 
    with a temp sign say #preHolder#
    */
    htm=htm.split(keyword+' ').join('#preHolder#'+' ');
    
    /*
    preserve keywords which have only space before 
    with a temp sign say #nextHolder#
    */
    htm=htm.split(' '+keyword).join(' '+'#nextHolder#');
    
    /*
    replace remaining keywords with marked up span.
    Add tail to both side of span to make sure it is
    connected to both letters before and after
    */
    htm=htm.split(keyword).join(tail+'<span style="color:#ff0000">'+tail+keyword+tail+'</span>'+tail);
    
    //Deal #preHolder# by adding tail only before the keyword
    htm=htm.split('#preHolder#'+' ').join(tail+'<span style="color:#ff0000">'+tail+keyword+'</span>'+' ');
    
    //Deal #nextHolder# by adding tail only after the keyword   
    htm=htm.split(' '+'#nextHolder#').join(' '+'<span style="color:#ff0000">'+keyword+tail+'</span>'+tail);
    
    //Deal #fullHolder# by adding markup only without tail
    htm=htm.split(' '+'#fullHolder#'+' ').join(' '+'<span style="color:#ff0000">'+keyword+'</span>'+' ');
				
   //Remove all possible combination of added tails to non-connecting characters
   var nonConnectings=['ا','آ','د','ذ','ر','ز','ژ','و'];
   
   for (x = 0; x < nonConnectings.length; x++) {
    htm=htm.split(nonConnectings[x]+tail).join(nonConnectings[x]);
    htm=htm.split(nonConnectings[x]+'<span style="color:#ff0000">'+tail).join(nonConnectings[x]+'<span style="color:#ff0000">');
    htm=htm.split(nonConnectings[x]+'</span>'+tail).join(nonConnectings[x]+'</span>');
   }
   
   $(this).html(htm);
  })
})
div{font-size:26pt}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div class="searchableContent">
سترون - بستری - آستر - بستر - استراحت
</div>