After setting EnableHexNumpad=1, hex Unicode alt-codes work well in Notepad++ and Excel but have a strange behavior in Word

I try to summarize what is the behavior I see, keeping into consideration that Windows can't handle codes above 0xFFFF:

  1. Word works flawlessly with decimal code point inserted through Alt+decimal. As an example, no problem in inserting even ๐Ÿ˜€ as Alt+128512, (As a side note, Notepad++ doesn't manage this shortcut and put \0 in the file. Excel ignores the input.)
  2. Word accepts hex code for low code point blocks. However for some (many?) higher Unicode blocks, if there aren't hex A-F digits in the code, the code is interpreted as decimal unavoidably. As an example, let's take the "Ethiopic" code block, which begins at 0x1200 and enter in Word the sequence Alt++1200, Alt++1201, ..., Alt++120F, i.e. the first row of the block. We would expect to insert the characters แˆ€แˆแˆ‚แˆƒแˆ„แˆ…แˆ†แˆ‡แˆˆแˆ‰แˆŠแˆ‹แˆŒแˆแˆŽแˆ. Instead I see าฐาฑาฒาณาดาตาถาทาธานแˆŠแˆ‹แˆŒแˆแˆŽแˆ, so that last six characters are correct, while first ten are not: they are from code points 0x4B0-0x4B9, or in decimal format 1200-1209. The error is apparent: when there aren't A-F digits in the code, it is interpreted as decimal even if the + is prepended. Notepad++ and Excel work as expected for these cases. This seems to be linked to an internal association between available font glyphs, but I didn't get any definitive conclusion.
  3. For completely unsupported code blocks, A-F digits aren't considered and only the numbers concur, interpreted as decimals. As an example, let's enter Alt++30C4. Instead of the ใƒ„ katakana I obtain ฤฐ, which is code point 0x130 or decimal 304 (the original 30C4 string without C). Even with hex codes of more than 4 digits one has the same behavior: attempting to insert Alt++1f600, the emoji of the point 1., inputs 0x640 or decimal 1600. In Excel this latter alt code inserts 0xF600 (verified with UNICODE() function), which is invalid and shown as ๏˜€ but, keeping in mind the 4-digit limitation, this seems reasonable.

So, is it all about a simply misconfigured system, or is there some option I can explore to revert to expected behavior (Word 365 MSO (16.0.14131.20278) 32-bit)?


Solution 1:

I would chalk this up to bugs in Word, which is well-known for carrying ancient methods that may work differently than newly-programmed ones.

For example, typing in Word "1200 alt x" gives แˆ€ as expected, while typing "alt + 1200" gives าฐ.

The interesting part here is that the Unicode hex code of าฐ is 4B0. I note that decimal 1200 converted into hex is 4B0. I also note that "alt + 4B0" also gives าฐ.

From this I conclude that Word will do the following irrational test: If after "alt +" the entered string contains only digits ("1200") it will assume that it's written in decimal, but if it contains one of the letters a-f ("4B0") it is taken as hexadecimal.

This theory of mine is born out by your tests - when your entered codes started including the letters a-f, they were interpreted correctly as hex. As long as they only contained decimal digits, they were wrongly interpreted as being decimal.

The implementation by Microsoft of the EnableHexNumpad option seems to be very flawed.

Word cannot be fixed by you or me. The most you can do is signal the problem to Microsoft via the Feedback Hub (which wouldn't help much).

If you need a third-party utility that doesn't have such gotchas, you may for example use the ancient UnicodeInput which still works in Windows 10 for entering Unicode. It intercepts Alt+ and puts up a dialog box where the Unicode can be entered.