I have a problem with copy and paste of characters like "ü". When I copy and paste the name "Gereon Müller" from this book http://langsci-press.org/catalog/book/18 (choose the download tab) I get the ü decomposed into two characters on a Mac. This does not happen under Windows and Linux (xpdf, acroread, sumatra), but it does happen on a mac with acroread and with skim. Any ideas?

Edit: These are the two characters: ü In the text you can see an ü, but it is actually an u with two dots that are shifted over the u. Look at this: ẗ (as you can see, I composed a t with these two dots). This is not a problem for reading the ü, but if I want to continue to work with this ü I get problems, since LaTeX does not like these characters at all. In emacs I can edit these two characters seperately.

Edit II:

I played with different applications and they behave differently: Word gets one copy and paste right, while emacs gets both wrong.

Word:

Word

Emacs:

emacs

Edit III

And this is text edit:

enter image description here


Solution 1:

If having such characters in decomposed form (represented by two codepoints, base character plus combining accent mark) is a problem for further processing you need to do, then you can use an app like Unicode Checker to convert the text to Unicode Normalization Form C. That will change them into the composed, single-codepoint form.

http://earthlingsoft.net/UnicodeChecker/