Unicode normalization for filenames and copied text from pdf:s

HFS+ requires filenames to be in decomposed form (LATIN SMALL LETTER A + COMBINING DIAERESIS) instead of composed form (LATIN SMALL LETTER A WITH DIAERESIS). You can use iconv to convert text to composed form:

$ echo -n ä | xxd -p
c3a4
$ touch ä
$ ls | tr -d '\n' | xxd -p
61cc88
$ ls | tr -d '\n' | iconv -f utf-8-mac -t utf-8 | xxd -p
c3a4

HFS+ does not use NFD (normal form decomposed). From http://developer.apple.com/library/mac/#qa/qa1173/_index.html:

Important: The terms used in this Q&A, precomposed and decomposed, roughly correspond to Unicode Normal Forms C and D, respectively. However, most volume formats do not follow the exact specification for these normal forms. For example, HFS Plus (Mac OS Extended) uses a variant of Normal Form D in which U+2000 through U+2FFF, U+F900 through U+FAFF, and U+2F800 through U+2FAFF are not decomposed (this avoids problems with round trip conversions from old Mac text encodings).

Something like this might also work:

python -c 'import unicodedata as ud; print ud.normalize("NFC", u"\N{LATIN SMALL LETTER A}\N{COMBINING DIAERESIS}")'