Setting "LANG=C LC_ALL=C" in script has no effect on padding length for non-English characters in printf

I am working on formatting the output of a Bash script into a table such that the pipe characters used to represent the bounds of each column are spaced relative to the longest string in that column (thus lining up all of the pipe characters nicely).

Below is the relevant bit of output. Note how both for the "il2" and "ksp" rows, both pipe characters are aligned, however in the "iceland" row, which contains the non-English character 'ý', the second pipe character is left-indented one space relative to the other rows.

iceland          | Mýrdalssandur Iceland                    | steam://rungameid/1248990                                                                   | iceland-Win64-Shipping.exe
il2              | IL 2 Sturmovik: 1946                      | steam://rungameid/15320                                                                     | il2fb.exe
ksp              | Kerbal Space Program                      | steam://rungameid/220200                                                                    | KSP_x64.exe

At the very beginning of the script, I have the following:

oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C

And at the end, right before the final "done" statement, I have:

LANG=$oLang LC_ALL=$oLcAll

There is no other change of language settings throughout the script.

If in a shell I set a $string variable to "Mýrdalssandur Iceland" and run printf "%-14s is %2d char length\n" "'$string'" ${#string}, I get a character length of 21, but when I run LANG=C LC_ALL=C and then printf "%-14s is %2d char length\n" "'$string'" ${#string}, I get a character length of 22, showing that it should in-fact make a difference.

I created a barebones script to test the padding, and in that script, changing LANG does have the desired effect, even when the printf lines are contained in a function as they are in the main script in question.

I'm at a loss at this point as to why the output isn't applying the language change, so any help is appreciated.

Here is the full script:

#!/bin/bash

oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C

#Config file location
listpath="/home/$USER/.config/play/"
config="$listpath"config
games="$listpath"games.list

if ! [ -f $config ] ; then
    echo "Config file not found at \"$config\""
    exit
fi

#Vars/arrs used for script
gamecmd="$1"
prgm=""
line=""
pathf=""
ctr=0
rctr=0
appID=0
taskname=""
result=""
task=""
answer=""
longestshort=0
longestlong=0
longestpathf=0
longestservicename=0
ip=`cat $config | sed -n -e 's/^.*ip=//p'`
wr=`cat $config | sed -n -e 's/^.*wr=//p'`
display=`cat $config | sed -n -e 's/^.*display=//p'`
ControlMyMonitor=`cat $config | sed -n -e 's/^.*ControlMyMonitor=//p'`
ControlMyMonitor=${ControlMyMonitor//\\//}
declare -a short
declare -a long
declare -a path
declare -a servicename

#Shortcut locations to make adding games easier
STEAM="steam://rungameid/"
DESKTOP=`cat $config | sed -n -e 's/^.*desktop=//p'`
APPDATA=`cat $config | sed -n -e 's/^.*appdata=//p'`

function check_for_duplicate()
{
    dctr=0

    #Set delimiter to comma
    IFS=','

    while IFS= read -r line
    do
        read -a strarr <<< "$line"

        for (( n=0; n < ${#strarr[*]}; n++))
        do
            if [ "${strarr[n]}" == "$1" ]
            then
                get_record $dctr "$line"
                echo -e "Duplicate entry for \""$1"\" found:"
                echo
                print_verbose_header
                echo
                pathf="${path[$dctr]}"
                format_pathf
                    print_verbose_record "${short[$dctr]}" "${long[$dctr]}" "$pathf" "${servicename[$dctr]}"

                exit
            fi
        done
        ((dctr++))
    done < $games
}
function format_pathf()
{
    pathf="${pathf//\"/}"
    pathf="${pathf//%STEAM%/$STEAM}"
    pathf="${pathf//%DESKTOP%/$DESKTOP}"
    pathf="${pathf//%APPDATA%/$APPDATA}"
}
function get_normal_header_spacing()
{
    for (( IFS=" " i=0; i<$ctr; i++ ))
    do
        if [ "${#short[i]}" -gt "$longestshort" ]
        then
            longestshort="${#short[i]}"
        fi
        if [ "${#long[i]}" -gt "$longestlong" ]
        then
            longestlong="${#long[i]}"
        fi
        done
}
function get_verbose_header_spacing()
{
    for (( IFS=" " i=0; i<$ctr; i++ ))
    do
        if [ "${#short[i]}" -gt "$longestshort" ]
        then
            longestshort="${#short[i]}"
        fi
        if [ "${#long[i]}" -gt "$longestlong" ]
        then
            longestlong="${#long[i]}"
        fi

        pathf="${path[i]}"
        format_pathf

        if [ "${#pathf}" -gt "$longestpathf" ]
        then
            longestpathf="${#pathf}"
        fi
        if [ "${#servicename[i]}" -gt "$longestservicename" ]
        then
            longestservicename="${#servicename[i]}"
        fi
        done
}
function get_record()
{
    short[$1]=$(echo "$2" | grep -Po '([^,]+)' | head -1)
    long[$1]=$(echo "$2" | grep -Po '([^,]+)' | head -2 | tail -1)
    path[$1]=$(echo "$2" | grep -Po '([^,]+)' | head -3 | tail -1)
    servicename[$1]=$(echo "$2" | grep -Po '([^,]+)' | head -4 | tail -1)
}
function print_normal_header()
{
    printf "%-${longestshort}s   %-${longestlong}s\n" "Command" "Game Title"

    for (( i=0; i<$(($longestshort + $longestlong + 2)); i++ ))
    do
        echo -n "-"
    done
}
function print_verbose_header()
{
    printf "%-${longestshort}s   %-${longestlong}s   %-${longestpathf}s   %s\n" "Command" "Game Title" "Path" "Task Name"
        
    for (( i=0; i<$(($longestshort + $longestlong + $longestpathf + $longestservicename + 4)); i++ ))
    do
        echo -n "-"
    done
}
function print_normal_record()
{
    printf "%-${longestshort}s | %-${longestlong}s\n" "$1" "$2"
}
function print_verbose_record()
{
    printf "%-${longestshort}s | %-${longestlong}s | %-${longestpathf}s | %s\n" "$1" "$2" "$3" "$4"
}
function print_help()
{
    echo -e "Launch program on remote Windows PC and change monitor inputs automatically.\n"
    echo -e "USAGE"
    echo -e "-----"
    echo -e
    echo -e "\e[3mplay [COMMAND]\e[0m"
    echo -e "Launch the game defined by the COMMAND parameter.\n"
    echo -e "\e[3mplay -l\e[0m"
    echo -e "Print a formatted table of games saved to the games.list file excluding file paths.\n"
    echo -e "\e[3mplay -lv\e[0m"
    echo -e "Print a formatted table of games saved to the games.list file including file paths and task names.\n"
    echo -e "\e[3mplay -f [PATTERN]\e[0m"
    echo -e "Search for a matching PATTERN in the games.list file and list results excluding file paths.\n"
    echo -e "\e[3mplay -fv [PATTERN]\e[0m"
    echo -e "Search for a matching PATTERN in the games.list file and list results including file paths and task names.\n"
    echo -e "\e[3mplay -a [COMMAND] [TITLE] [PATH] [TASK]\e[0m"
    echo -e "Add a game to the list that is located at the PATH on the Windows machine, is started by using the COMMAND parameter, and can be easily referenced in the -l functionality by the game's formal TITLE, which shows up in task manager as TASK (typically a \".exe\" file).\n"
    echo -e "\tFilepath shortcuts for PATH"
    echo -e "\t---------------------------"
    printf  "\t%-10s %-1s %-50s\n" "%STEAM%" "=" "$STEAM"
    printf  "\t%-10s %-1s %-50s\n" "%DESKTOP%" "=" "$DESKTOP"
    printf  "\t%-10s %-1s %-50s\n\n" "%APPDATA%" "=" "$APPDATA"
    echo -e "\e[3mplay -r [PATTERN]\e[0m"
    echo -e "Search for a matching PATTERN in the games.list file and remove it from the games.list file. This will generate a games.list.bak file, which is the games.list file before the specified program is removed.\n"
    echo -e "\e[3mplay -?\e[0m OR \e[3mplay --help\e[0m"
    echo -e "Print this help text.\n"
}
function list_games()
{
        for (( IFS=" " i=0; i<$ctr; i++ ))
    do
        print_normal_record "${short[i]}" "${long[i]}"
        done
    echo
}
function list_games_verbose()
{
        for (( IFS=" " i=0; i<$ctr; i++ ))
    do
        #Remove/replace backslash, quote marks, and path shortcuts from path and print result in table
        pathf="${path[i]//\\/}"
        format_pathf
            print_verbose_record "${short[i]}" "${long[i]}" "$pathf" "${servicename[i]}"
        done
    echo
}
function run_game()
{
    isRunning=0

    echo "Changing monitor input and launching winrun with parameters \"start $1\""
    sudo chmon $display
    winrun "start $1"

    while [[ 1 ]]
    do
        #Wait for program to actually show up in Task Manager (accounts for user messing around in launcher
        if [ -n $(winrun "tasklist /fi \"SESSIONNAME eq CONSOLE\" /fo list" | grep -i $task) ] && [!isRunning]; then
            isRunning=1

        #Switch monitor inputs when program is no longer running
        elif [ -z $(winrun "tasklist /fi \"SESSIONNAME eq CONSOLE\" /fo list" | grep -i $task) ] && [isRunning]; then
            echo "No longer seeing $task running on Window system, changing display input."
            winrun "run $ControlMyMonitor /SetValue \\\\.\DISPLAY1\Monitor0 60 4"
            break;
        fi
    done
}

#Load games from games.list file
if [ -f $games ]
then

    #Read games list
    while IFS= read -r line
    do
        short[$ctr]=$(echo $line | awk -F, '{print $1}')
        long[$ctr]=$(echo $line | awk -F, '{print $2}')
        path[$ctr]=$(echo $line | awk -F, '{print $3}')
        servicename[$ctr]=$(echo $line | awk -F, '{print $4}')
        ((ctr++))
    done < $games
else
    #Make games list file if it does not exist
    echo $games file  not found, creating!
    touch $games
fi

#Check for script arguments
while [ "$#" != "" ]
do
    case "$1" in
        -l)#List games

            if [ "$2" ]
            then
                echo -e "Detected arguments after \""$1"\". Ignoring, as these are not used...\n"
            fi

            get_normal_header_spacing
            print_normal_header
            list_games | sort
            break;;

        -lv)#Display games list with more info
            
            if [ "$2" ]
            then
                echo -e "Detected arguments after \""$1"\". Ignoring, as these are not used...\n"
            fi

            get_verbose_header_spacing
                        print_verbose_header
            list_games_verbose | sort
                        break;;

        -f)#Find for parameter and display

            if [ ! "$2" ]
            then
                echo "Missing search parameter!" >&2
                break
            fi
            if [ "$3" ]
            then
                echo -e "Detected arguments after \""$2"\". Ignoring, as these are not used...\n"
            fi

            get_normal_header_spacing
            print_normal_header
            echo
            list_games | grep -i "$2" | sort
            break;;

        -fv)#Find for parameter and display with more info

            if [ ! "$2" ]
            then
                echo "Missing search parameter!" >&2
                break
            fi
            if [ "$3" ]
            then
                echo -e "Detected arguments after \""$2"\". Ignoring, as these are not used...\n"
            fi

            get_verbose_header_spacing
            print_verbose_header
            echo
            list_games_verbose | grep -i "$2" | sort
            break;;

        -a)#Add game to list

            if [ ! "$2" ]
            then
                echo "Missing program command string!" >&2
                break

                elif [ ! "$3" ]
                then
                    echo "Missing program title!" >&2
                    break

                    elif [ ! "$4" ]
                    then
                        echo "Missing program path!" >&2
                        break

                        elif [ ! "$5" ]
                        then
                            echo "Missing program task name!" >&2
                            break
            fi
            if [ "$6" ]
            then
                echo -e "Detected arguments after \""$5"\". Ignoring, as these are not used...\n"
            fi

            #Check for duplicate entry
            check_for_duplicate "$2"
            check_for_duplicate "$3"
            check_for_duplicate "$4"

            #If no duplicate is found, add entry
            if [ ! $match ]
            then
                pathf="${4//\\//}"
                echo ""$2","$3","$pathf","$5"" >> "$games"
                echo "Added the following record:"
                echo
                print_verbose_header
                echo
                format_pathf
                    print_verbose_record "$2" "$3" "$pathf" "$5"
            fi

            break;;

        -r)#Attempt to remove game from list

            if [ ! "$2" ]
            then
                echo "Missing search parameter!" >&2
                break
            fi
            if [ "$3" ]
            then
                echo -e "Detected arguments after \""$2"\". Ignoring, as these are not used...\n"
            fi

            rctr=0
            while IFS= read -r line
            do
                if [[ -n $(echo $line | grep "$2") ]]
                then
                    match=1

                    get_record $rctr "$line"
                    echo "Discovered the following matching record:"
                    echo
                    print_verbose_header
                    echo
                    pathf="${path[$rctr]}"
                    format_pathf
                        print_verbose_record "${short[$rctr]}" "${long[$rctr]}" "$pathf" "${servicename[$rctr]}"
                    echo                    

                    while [[ $answer != "y" && $answer != "n" ]]
                    do
                        echo -n "Remove this record? (y/n) "
                        read -r answer </dev/tty
                    done

                    if [ $answer == "n" ]
                    then
                        echo "$games not changed."
                        break
                    fi

                    echo
                    echo "Removed the following record:"
                    print_verbose_header
                    echo
                    print_verbose_record "${short[$rctr]}" "${long[$rctr]}" "$pathf" "${servicename[$rctr]}"

                    #Shift all entries beyond the removed entry down one
                    for (( i=$rctr; i<ctr; i++ ))
                    do
                        short[i]=${short[i+1]}                      
                        long[i]=${long[i+1]}
                        path[i]=${path[i+1]}
                        servicename[i]=${servicename[i+1]}
                    done

                    #Rebuild games.list
                    cp $games $games.bak
                    rm $games
                    for (( i=0; i<(ctr-1); i++ ))
                    do
                        echo "${short[i]}","${long[i]}","${path[i]}","${servicename[i]}" >> $games
                    done

                    break
                fi

                ((rctr++))
            done < $games


            if [ ! $match ]
            then
                echo -e "Did not find match for \""$2"\" in \"$games\""
            fi
            break;;

        -? | --help)#Print help text

            if [ "$2" ]
            then
                echo -e "Detected arguments after \""$1"\". Ignoring, as these are not used...\n"
            fi

            print_help
            break;;

        *)#Search for matching game

            if [ ! "$1" ]
            then
                print_help
                break;
            fi

            if [ "$2" ]
            then
                echo -e "Detected arguments after \""$1"\". Ignoring, as these are not used...\n"
            fi

            echo -n "Checking if \""$1"\" matches a known game in games.list... "
            shopt -s nocasematch
            for (( i=0; i<$ctr; i++ ))
            do
                if [[ "$1" == "${short[i]}" ]]; then
                    path[i]="${path[i]//%STEAM%/$STEAM}"
                    path[i]="${path[i]//%DESKTOP%/$DESKTOP}"
                    path[i]="${path[i]//%APPDATA%/$APPDATA}"
                    prgm="${path[i]}"
                    task="${servicename[i]}"
                    break
                fi
            done
            shopt -u nocasematch

            #If there was a match, run program. Otherwise, run first arg as is.
            if [ "$prgm" != "" ]
            then
                echo "Found!"
                run_game "$prgm"
            else
                echo "Not found!"
                run_game "$1"
            fi

            break;;
    esac
    shift

#LANG=$oLang LC_ALL=$oLcAll

done

Solution 1:

Bash's printf field widths work in bytes, not in characters. Because ý is stored as two bytes in the UTF-8 encoding, printf will think it's two columns wide – regardless of your locale setting.

(Yes, that is inconsistent with string operations like ${#var}, most likely because the printf command is just a direct passthrough to the C printf() function which is always byte-oriented and bash has no say in how it works.)

However, if printf worked with characters, then LC_CTYPE=C would be counterproductive – you would want the two bytes c3 bd to be treated as representing a single character, just as they're drawn in a single cell, and therefore you would want to keep LC_CTYPE set to something.UTF-8. (If bash thinks a string is wider than it's actually displayed, then it will add less padding, and so the column will appear to be narrower than it should be, just like in your situation.)

I would recommend using the column tool that is part of most Linux systems – it does understand UTF-8 input, and it would save you from needing to calculate the column widths in general. In fact it would even work with characters that are more than 1 cell wide – for example:

(
    printf '%s\t%s\t%s\t%s\n' "iceland" "Mýrdalssandur Iceland" "steam://rungameid/1248990" "iceland-Win64-Shipping.exe"
    printf '%s\t%s\t%s\t%s\n' "il2" "IL 2 Sturmovik: 1946" "steam://rungameid/15320" "il2fb.exe"
    printf '%s\t%s\t%s\t%s\n' "ksp" "Kerbal Sp🚀ce Pr🚀gram" "steam://rungameid/220200" "KSP_x64.exe"
    printf '%s\t%s\t%s\t%s\n' "ffxiv" "ファイナルファンタジーXIV(FF14)" "steam://rungameid/39210" "ffxiv.exe"
) | column -s $'\t' -t -N SHORT,LONG,URL,EXE -W LONG -o ' │ '

The fancy options like -N or -o require at least version 2.30 of the util-linux package, but the basic column -t -s ... will still work with older versions.

But if you do not want to use any external tools, then you'll need to manually pad the strings – making use of the ${#var} expansion which tells you the number of characters (although not the number of terminal cells, and therefore not as good as column):

printf '%s%*s | %s%*s | %s%*s | %s%*s\n' \
  "$short" $((maxshort - ${#short})) "" \
  "$long"  $((maxlong - ${#long}))   "" \
  "$url"   $((maxurl - ${#url}))     "" \
  "$exe"   $((maxexe - ${#exe}))     "" ;

Note: In both cases above, whether you use 'column' or bash's ${#var}, you need the data to be interpreted as UTF-8 to get the correct character count, and therefore you must not set LANG or LC_ALL to "C".