How can I get the length of an array in awk?

When you split an array, the number of elements is returned, so you can say:

echo "hello world" | awk '{n=split($0, array, " ")} END{print n }'
# ------------------------^^^--------------------------------^^

Output is:

2

Mr. Ventimiglia's function requires a little adjustment to do the work (see the semicolon in for statement):

function alen(a, i) {
    for(i in a);
    return i
}

But don't work all the cases or times. That is because the manner that awk store and "see" the indexes of the arrays: they are associative and no necessarily contiguous (like C.) So, i does not return the "last" element.

To resolve it, you need to count:

function alen(a, i, k) {
    k = 0
    for(i in a) k++
    return k
}

And, in this manner, take care other index types of "unidimensional" arrays, where the index maybe an string. Please see: http://docstore.mik.ua/orelly/unix/sedawk/ch08_04.htm. For "multidimensional" and arbitrary arrays, see http://www.gnu.org/software/gawk/manual/html_node/Walking-Arrays.html#Walking-Arrays.


I don't think the person is asking, "How do I split a string and get the length of the resulting array?" I think the command they provide is just an example of the situation where it arose. In particular, I think the person is asking 1) Why does length(array) provoke an error, and 2) How can I get the length of an array in awk?

The answer to the first question is that the length function does not operate on arrays in POSIX standard awk, though it does in GNU awk (gawk) and a few other variations. The answer to the second question is (if we want a solution that works in all variations of awk) to do a linear scan.

For example, a function like this:

function alen (a,     i) {
    for (i in a);
    return i;}

NOTE: The second parameter i warrants some explanation.

The way you introduce local variables in awk is as extra function parameters and the convention is to indicate this by adding extra spaces before these parameters. This is discussed in the GNU Awk manual here.


In gawk you can use the function length():

$ gawk 'BEGIN{a[1]=1; a[2]=2; a[23]=45; print length(a)}'
3

$ gawk 'BEGIN{a[1]=1; a[2]=2; print length(a); a[23]=45; print length(a)}'
2
3

From The GNU Awk user's guide:

With gawk and several other awk implementations, when given an array argument, the length() function returns the number of elements in the array. (c.e.) This is less useful than it might seem at first, as the array is not guaranteed to be indexed from one to the number of elements in it. If --lint is provided on the command line (see Options), gawk warns that passing an array argument is not portable. If --posix is supplied, using an array argument is a fatal error (see Arrays).