macOS Terminal - using wget and bash- ERROR "Argument list is too long"

I'm using wget and bash to download a number of sequential files from a URL using {1..####} but am getting the error: Argument list is too long

  1. When I run getconf ARG_MAX it says 262144 - what is this limit in reference to?

  2. What command will increase the argument limit (or can I remove it or set it to infinite?)


Solution 1:

  1. ARG_MAX is the limit (in terms of memory used) on the size of the total argument list + environment variables passed to an executable. See this previous question, and this more detailed explanation.

  2. You can use xargs to split a list of arguments into small-enough-to-process groups, but depending on the form of the arguments (and whether they contain various troublesome characters like whitespace, escapes, quotes etc) this can get complicated. One generally safe way to do it is to use printf '%s\0' to print the argument list with nulls terminating it, then xargs -0 to consume the list:

     printf '%s\0' https://example.com/prefix{1..100}.html | xargs -0 curl -O
    

    Note that any arguments that should be passed to each invocation of the utility (like the -O in this example) must be included in the xargs invocation, not the printf arg list. Also, if there are arguments that need to be passed after the big list, you need a more complex invocation of xargs.

    Also, this may look like it shouldn't work because the huge argument list is still being passed to printf, but that's a shell builtin, not a separate executable, so it's handled inside bash itself and the limit doesn't apply.

[BTW, I thought there must be a previous Q&A covering this, but I couldn't find one. If someone else finds a good one, please mark this question as a duplicate.]

Solution 2:

When you run

wget 'https://example.com/prefix'{1..9999}'.html'

the expansion of the {1..9999} is done by the shell, resulting in an extremely long list of arguments (run echo foo{1..10} to see what happens).

Instead, you can just run

for i in {1..9999}; do
    wget 'https://example.com/prefix'${i}'.html'
done

or (as a one-liner)

for i in {1..9999}; do wget 'https://example.com/prefix'${i}'.html'; done

to have the shell handle the loop directly and not in the arguments passed to wget. The overall performance of the downloads is limited by the network anyway, so forking and executing 10'000 wget processes (instead of just one) doesn't have a noticeable impact.

PS: Replace 9999 with whatever the highest number is, or use something like {1,7,9,15,22,36} for specific numbers.