How to split a delimited string into an array in awk?

Solution 1:

Have you tried:

echo "12|23|11" | awk '{split($0,a,"|"); print a[3],a[2],a[1]}'

Solution 2:

To split a string to an array in awk we use the function split():

awk '{split($0, array, ":")}'
#           \/  \___/  \_/
#           |     |     |
#       string    |     delimiter
#                 |
#               array to store the pieces

If no separator is given, it uses the FS, which defaults to the space:

$ awk '{split($0, array); print array[2]}' <<< "a:b c:d e"
c:d

We can give a separator, for example ::

$ awk '{split($0, array, ":"); print array[2]}' <<< "a:b c:d e"
b c

Which is equivalent to setting it through the FS:

$ awk -F: '{split($0, array); print array[1]}' <<< "a:b c:d e"
b c

In GNU Awk you can also provide the separator as a regexp:

$ awk '{split($0, array, ":*"); print array[2]}' <<< "a:::b c::d e
#note multiple :
b c

And even see what the delimiter was on every step by using its fourth parameter:

$ awk '{split($0, array, ":*", sep); print array[2]; print sep[1]}' <<< "a:::b c::d e"
b c
:::

Let's quote the man page of GNU awk:

split(string, array [, fieldsep [, seps ] ])

Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array[1], the second piece in array[2], and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).

Solution 3:

Please be more specific! What do you mean by "it doesn't work"? Post the exact output (or error message), your OS and awk version:

% awk -F\| '{
  for (i = 0; ++i <= NF;)
    print i, $i
  }' <<<'12|23|11'
1 12
2 23
3 11

Or, using split:

% awk '{
  n = split($0, t, "|")
  for (i = 0; ++i <= n;)
    print i, t[i]
  }' <<<'12|23|11'
1 12
2 23
3 11

Edit: on Solaris you'll need to use the POSIX awk (/usr/xpg4/bin/awk) in order to process 4000 fields correctly.