How would you separate fields with multiple spaces and store them in an array?

In my file mytxt:

field1                    field2
------                    -------
this are numbers          12345
this letters              abc def ghi 

Let's say I want to store the first field in an array:

i=0
while read line; do 
    field_one[$i]=$(echo $line | awk '{print $1}')
    echo ${field_one[i]}  
    ((i++))
done < mytxt

That would give me the this two times in the output.

Any ideas of how could I store them in an array and get the output:

this are numbers
this letters

I have tried changing delimiters, squeezing spaces, and using sed, but I'm stuck. Any hint would be appreciated.

My final goal is to store both fields in an array.


Solution 1:

Using colrm to remove columns from file.

#!/bin/bash

shopt -s extglob

a=()
while read; do
   a+=("${REPLY%%*( )}")
done < <(colrm 26 < text.txt)

printf %s\\n "${a[@]:2:3}"

(Bash builtin version):

#!/bin/bash

shopt -s extglob

a=()
while read; do
    b="${REPLY::26}"; a+=("${b%%*( )}")
done < text.txt

printf %s\\n "${a[@]:2:3}"

Solution 2:

Moving my comment, based on this source, to just show a particular column on multiple-spaces based table:

awk -F '  +' '{print $2}' mytxt.txt  # Or with -F ' {2,}'

Note that this won't work if you use double quotes.

I found it particularly useful to find duplicates, using something like:

somelist... | sort | uniq -c | sort -rn | grep -vP "^ +1 " | awk -F '  +' '{print $3}'

Solution 3:

You could use the bash builtin mapfile (aka readarray) with a callback that uses parameter expansion to trim the longest trailing substring starting with two spaces:

mapfile -c 1 -C 'f() { field_one[$1]="${2%%  *}"; }; f' < mytxt

Ex. given

$ cat mytxt
field1                    field2
------                    -------
this are numbers          12345
this letters              abc def ghi 

then

$ mapfile -c 1 -C 'f() { field_one[$1]="${2%%  *}"; }; f' < mytxt
$
$ printf '%s\n' "${field_one[@]}" | cat -A
field1$
------$
this are numbers$
this letters$

Solution 4:

This answer focuses on removing two heading lines from the array to match output requirements.

$ cat fieldone.txt
field1                    field2
------                    -------
this are numbers          12345
this letters              abc def ghi 

$ fieldone
this are numbers         
this letters             

Here is the script:

#!/bin/bash

# NAME: fieldone
# PATH: $HOME/askubuntu/
# DESC: Answer for: https://askubuntu.com/questions/1194620/
# how-would-you-separate-fields-with-multiple-spaces-and-store-them-in-an-array

# DATE: December 8, 2019.

i=0                                     # Current 0-based array index number
while read line; do                     # Read all lines from input file
    ((LineNo++))                        # Current line number of input file
    [[ $LineNo -eq 1 ]] && continue     # "Field 1     Field 2" skip first line
    if [[ $LineNo -eq 2 ]] ; then       # Is this is line 2?
        # Grab the second column position explained in:
        # https://unix.stackexchange.com/questions/153339/
        # how-to-find-a-position-of-a-character-using-grep
        Len="$(grep -aob ' -' <<< "$line" | \grep -oE '[0-9]+')"
        continue                        # Loop back for first field
    fi

    field_one[$i]="${line:0:$Len}"      # Extract line position 0 for Len
    echo "${field_one[i]}"              # Display array index just added
    ((i++))                             # Increment for next array element

done < fieldone.txt                     # Input filename fed into read loop

Hopefully code and comments are self explanatory. If not don't hesitate to comment.

The script still works if only one space separates the two columns whereas some other answers will break:

field1         field2
------         ------
this is letter abcdef
this is number 123456