Bash how to printf UTF-8 [duplicate]

Bash's printf field widths work in bytes, not in characters. Because ý is stored as two bytes in the UTF-8 encoding, printf will think it's two columns wide – regardless of your locale setting.

(Yes, that is inconsistent with string operations like ${#var}, most likely because the printf command is just a direct passthrough to the C printf() function which is always byte-oriented and bash has no say in how it works.)

However, if printf worked with characters, then LC_CTYPE=C would be counterproductive – you would want the two bytes c3 bd to be treated as representing a single character, just as they're drawn in a single cell, and therefore you would want to keep LC_CTYPE set to something.UTF-8. (If bash thinks a string is wider than it's actually displayed, then it will add less padding, and so the column will appear to be narrower than it should be, just like in your situation.)

I would recommend using the column tool that is part of most Linux systems – it does understand UTF-8 input, and it would save you from needing to calculate the column widths in general. In fact it would even work with characters that are more than 1 cell wide – for example:

(
    printf '%s\t%s\t%s\t%s\n' "iceland" "Mýrdalssandur Iceland" "steam://rungameid/1248990" "iceland-Win64-Shipping.exe"
    printf '%s\t%s\t%s\t%s\n' "il2" "IL 2 Sturmovik: 1946" "steam://rungameid/15320" "il2fb.exe"
    printf '%s\t%s\t%s\t%s\n' "ksp" "Kerbal Sp🚀ce Pr🚀gram" "steam://rungameid/220200" "KSP_x64.exe"
    printf '%s\t%s\t%s\t%s\n' "ffxiv" "ファイナルファンタジーXIV(FF14)" "steam://rungameid/39210" "ffxiv.exe"
) | column -s $'\t' -t -N SHORT,LONG,URL,EXE -W LONG -o ' │ '

The fancy options like -N or -o require at least version 2.30 of the util-linux package, but the basic column -t -s ... will still work with older versions.

But if you do not want to use any external tools, then you'll need to manually pad the strings – making use of the ${#var} expansion which tells you the number of characters (although not the number of terminal cells, and therefore not as good as column):

printf '%s%*s | %s%*s | %s%*s | %s%*s\n' \
  "$short" $((maxshort - ${#short})) "" \
  "$long"  $((maxlong - ${#long}))   "" \
  "$url"   $((maxurl - ${#url}))     "" \
  "$exe"   $((maxexe - ${#exe}))     "" ;

Note: In both cases above, whether you use 'column' or bash's ${#var}, you need the data to be interpreted as UTF-8 to get the correct character count, and therefore you must not set LANG or LC_ALL to "C".