What the difference and usage of encodings UTF-8 and UTF-8-MAC in iconv?

Solution 1:

As explained here, utf-8-mac is the utf 8 version of a text after application of Unicode normalization NFD (e.g accented characters are represented by the base character plus a combining accent character), with certain codepoint ranges excluded from the decomposition operation.

For example character é can be represented in two different equally valid ways in Unicode:

  • "\x{00E9}" - single codepoint, LATIN SMALL LETTER E WITH ACUTE, utf-8 C3 A9, "composed".
  • "\x{0065}\x{0301}" - two codepoints, LATIN SMALL LETTER E and COMBINING ACUTE ACCENT, utf-8 65 CC 81, "decomposed"

UTF-8-MAC will ensure that the second, decomposed form is always used.