How to return an array in bash without using globals?

I have a function that creates an array and I want to return the array to the caller:

create_array() {
  local my_list=("a", "b", "c")
  echo "${my_list[@]}"
}

my_algorithm() {
  local result=$(create_array)
}

With this, I only get an expanded string. How can I "return" my_list without using anything global?


Solution 1:

With Bash version 4.3 and above, you can make use of a nameref so that the caller can pass in the array name and the callee can use a nameref to populate the named array, indirectly.

#!/usr/bin/env bash

create_array() {
    local -n arr=$1             # use nameref for indirection
    arr=(one "two three" four)
}

use_array() {
    local my_array
    create_array my_array       # call function to populate the array
    echo "inside use_array"
    declare -p my_array         # test the array
}

use_array                       # call the main function

Produces the output:

inside use_array
declare -a my_array=([0]="one" [1]="two three" [2]="four")

You could make the function update an existing array as well:

update_array() {
    local -n arr=$1             # use nameref for indirection
    arr+=("two three" four)     # update the array
}

use_array() {
    local my_array=(one)
    update_array my_array       # call function to update the array
}

This is a more elegant and efficient approach since we don't need command substitution $() to grab the standard output of the function being called. It also helps if the function were to return more than one output - we can simply use as many namerefs as the number of outputs.


Here is what the Bash Manual says about nameref:

A variable can be assigned the nameref attribute using the -n option to the declare or local builtin commands (see Bash Builtins) to create a nameref, or a reference to another variable. This allows variables to be manipulated indirectly. Whenever the nameref variable is referenced, assigned to, unset, or has its attributes modified (other than using or changing the nameref attribute itself), the operation is actually performed on the variable specified by the nameref variable’s value. A nameref is commonly used within shell functions to refer to a variable whose name is passed as an argument to the function. For instance, if a variable name is passed to a shell function as its first argument, running

declare -n ref=$1 inside the function creates a nameref variable ref whose value is the variable name passed as the first argument. References and assignments to ref, and changes to its attributes, are treated as references, assignments, and attribute modifications to the variable whose name was passed as $1.

Solution 2:

What's wrong with globals?

Returning arrays is really not practical. There are lots of pitfalls.

That said, here's one technique that works if it's OK that the variable have the same name:

$ f () { local a; a=(abc 'def ghi' jkl); declare -p a; }
$ g () { local a; eval $(f); declare -p a; }
$ f; declare -p a; echo; g; declare -p a
declare -a a='([0]="abc" [1]="def ghi" [2]="jkl")'
-bash: declare: a: not found

declare -a a='([0]="abc" [1]="def ghi" [2]="jkl")'
-bash: declare: a: not found

The declare -p commands (except for the one in f() are used to display the state of the array for demonstration purposes. In f() it's used as the mechanism to return the array.

If you need the array to have a different name, you can do something like this:

$ g () { local b r; r=$(f); r="declare -a b=${r#*=}"; eval "$r"; declare -p a; declare -p b; }
$ f; declare -p a; echo; g; declare -p a
declare -a a='([0]="abc" [1]="def ghi" [2]="jkl")'
-bash: declare: a: not found

-bash: declare: a: not found
declare -a b='([0]="abc" [1]="def ghi" [2]="jkl")'
-bash: declare: a: not found

Solution 3:

Bash can't pass around data structures as return values. A return value must be a numeric exit status between 0-255. However, you can certainly use command or process substitution to pass commands to an eval statement if you're so inclined.

This is rarely worth the trouble, IMHO. If you must pass data structures around in Bash, use a global variable--that's what they're for. If you don't want to do that for some reason, though, think in terms of positional parameters.

Your example could easily be rewritten to use positional parameters instead of global variables:

use_array () {
    for idx in "$@"; do
        echo "$idx"
    done
}

create_array () {
    local array=("a" "b" "c")
    use_array "${array[@]}"
}

This all creates a certain amount of unnecessary complexity, though. Bash functions generally work best when you treat them more like procedures with side effects, and call them in sequence.

# Gather values and store them in FOO.
get_values_for_array () { :; }

# Do something with the values in FOO.
process_global_array_variable () { :; }

# Call your functions.
get_values_for_array
process_global_array_variable

If all you're worried about is polluting your global namespace, you can also use the unset builtin to remove a global variable after you're done with it. Using your original example, let my_list be global (by removing the local keyword) and add unset my_list to the end of my_algorithm to clean up after yourself.

Solution 4:

You were not so far out with your original solution. You had a couple of problems, you used a comma as a separator, and you failed to capture the returned items into a list, try this:

my_algorithm() {
  local result=( $(create_array) )
}

create_array() {
  local my_list=("a" "b" "c")  
  echo "${my_list[@]}" 
}

Considering the comments about embedded spaces, a few tweaks using IFS can solve that:

my_algorithm() {
  oldIFS="$IFS"
  IFS=','
  local result=( $(create_array) )
  IFS="$oldIFS"
  echo "Should be 'c d': ${result[1]}"
}

create_array() {
  IFS=','
  local my_list=("a b" "c d" "e f") 
  echo "${my_list[*]}" 
}