Sort a file according to a field starting with string
Try the following bash
command:
sort -t- -d -k2 -o output.txt input.txt
It has four options plus the name of the input file input.txt
. If this file is not in the current directory you will have to provide the path/to/the/folder/input.txt
. The options and their arguments are as follows:
- -t marks the field separator. We use
-
as the separator, so that everything before and after the-
are considered separate columns. - -d indicates dictionary sort. For example Apple is before Berry.
- -k2 indicates the column by which to sort, in this case the second column. Note the first column is everything before the first
-
. For example,/home/zz/BOOKS/Author
. The second column is in between the first and the second-
, that is,Artemis
. - -o
output.txt
redirects the sorted output to a file rather than to the terminal.
Hope this helps
Although it's overkill for the present example because of the solution proposed in user68186's answer, you could more generally do something like this in GNU awk:
gawk -F/ '
function mycmp(i1,v1,i2,v2) {
m = split(v1,a);
n = split(v2,b);
return a[m]"" > b[n]"" ? 1 : a[m]"" < b[n]"" ? -1 : 0
}
{
lines[NR] = $0
}
END {
PROCINFO["sorted_in"] = "mycmp";
for(i in lines) print lines[i]
}
' file
Note that it sorts according to the lexical value of everything after the last /
- so if the format is Author-<author name>-<title>.<extension>
that will be
- the fixed string
Author-
(which has no effect, since it has the same weight for all lines); then -
<author name>-
; then -
<title>.
; then <extension>
This is similar to how GNU sort
's simple KEYDEF -t- -k2
works i.e. the effective sort key starts from the <author name>
and continues to the line end.
An explicit delimiter is omitted from the split
calls so that they inherit the value of FS
, making it easy to change for systems that use a different path separator. The appended empty strings ""
in the mycmp
function force lexical comparison even if the filenames are numerical - see for example How awk Converts Between Strings and Numbers
If you'd rather stick with the sort
command, you could leverage GNU awk's Two-Way Communications with Another Process to:
- duplicate the last
/
-separated field at the start of the string - pass the result to a
sort
comnand - read back the sorted result, remove the duplicated prefix and print
i.e.
gawk -F/ '
BEGIN {OFS=FS; cmd = "sort -d"}
{print $NF $0 |& cmd}
END {
close(cmd,"to");
while(cmd |& getline){$1 = ""; print};
close(cmd,"from")
}
' file
There's a bit of a cheat here in that the absolute paths (lines start with /
) imply an initial empty field; to handle relative paths you'd need to change print $NF $0
to print $NF,$0
to insert the "missing" separator, and then perhaps use a regex sub()
instead of the simpler $1 = ""
to remove the leading element.
As well as potentially being faster / more memory efficient than the pure gawk
solution, this allows other sort
options to be added straightforwardly ex. cmd = "sort -d -t " FS " -k1,1r"
.