In Bash, are wildcard expansions guaranteed to be in order?
Is the expansion of a wildcard in Bash guaranteed to be in alphabetical order? I am forced to split a large file into 10 Mb pieces so that they can be be accepted by my Mercurial repository.
So I was thinking I could use:
split -b 10485760 Big.file BigFilePiece.
and then in place of:
cat BigFile | bigFileProcessor
I could do:
cat BigFilePiece.* | bigFileProcessor
in its place.
However, I could not find anywhere that guaranteed that the expansion of the asterisk (aka wildcard, aka *
) would always be in alphabetical order so that .aa
came before .ab
(as opposed to be timestamp ordering or something like that).
Also, are there any flaws in my plan? How great is the performance cost of cat
ing the file together?
Yes, globbing expansion is alphabetical.
From the Bash man
page:
Pathname Expansion
After word splitting, unless the
-f
option has been set, bash scans each word for the characters*
,?
, and[
. If one of these characters appears, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of file names matching the pattern.
It is documented behavior for bash
so you can depend upon it in your scripts. It also has been true of other Bourne compatible shells for a very long time ... though there may be corner cases regarding case folding or non-alphanumeric characters.
(The resulting list, in bash
will be in almost "ASCII-betical" order --- except that lower and upper case letters will be collated together as if there were no case differences but with lower case collated before their upper case equivalents. All non-alphabetics should collate into the same order as they appear in ASCII).
As others have pointed out this could be perturbed by your language related environment settings: LANG generally and LC_COLLATE more specifically. In might be safest to run commands that depend on glob expansion ordering under an env
command to clear the environment (using -i
or -u
as appropriate) or to pipe the results through sort
to ensure robust sequencing.