awk prints unnecessary new line in output
print
adds a newline by default. You can use printf("%s", a[1]);
instead or move the printing of a[1]
to where all the other fields are being printed. I've renamed the first use of a
into b
instead be be able to keep the value until later:
grep " 78_" filename | sort -k 3 | \
awk '{split($3,b,"_");split($3,a,"-"); print b[1] "," $3 "," a[2] "," $5 "," $6 "," $10}'
Output:
78,78_data_store-2021.11.26,2021.11.26,1,1,478.2mb
78,78_data_store-2021.12.12,2021.12.12,1,1,4.1gb
78,78_data_store-2021.12.24,2021.12.24,1,1,372.6mb
Based on your shown samples, please try following awk
code. Using Schwartzian transform
in awk
. Also using awk
+ sort
+ awk
combination here.
awk '
BEGIN{ OFS="," }
FNR>1 && /78_/{
split($3,arr,"[_-]")
print arr[4]"@"arr[1],$3,arr[4],$5,$6,$NF
}
' Input_file |
sort -t'@' -k1 |
awk '{sub(/^[^@]*@/,"")} 1'
Explanation for above code:
- Passing Input_file(OP's file) into
awk
program. - Setting
OFS
as comma here for all lines. - Checking condition if its greater than 1st line and having 78_ in it then only move further.
- Using
split
function to split 3rd field into an array named arr where delimiters are_-
here. - printing
arr[4]"@"arr[1],$3,arr[4],$5,$6,$NF
which is as per needed output, only thing is additionallyarr[4]@
is added front of the output so that we can sort it easily(could be removed later in this program). - Passing
awk
program's output tosort
command where setting field separator as@
and sorting it with 1st field(eg: 2021.12.12 in shown samples). - Passing sorted data to another
awk
program where removing everything from starting of value till 1st occurrence of@
(which was added additionally as mentioned in previous step).
Improvements in OP's attempts:
- We need not to use
grep
when we are usingawk
, it can take care of searching string itself, so removed it from answer. - We need not to use 2 times
split
that could also be merged into a singlesplit
by mentioning multiple separators insplit
.