Print the first column
I want to print column 1 of this file. I used this command: awk '{print $1}'
but it just printed the first word of the 1st column.
DATA
ABC transporters ABC transporters
Alanine, aspartate and glutamate metabolism Alanine, aspartate
alpha-Linolenic acid metabolism alpha-Linolenic acid metabolism
Aminoacyl-tRNA biosynthesis Aminoacyl-tRNA biosynthesis
Amino sugar and nucleotide sugar metabolism Amino sugar and nucleotide
Arachidonic acid metabolism Arachidonic
Output:
ABC
Alanine,
alpha-Linolenic
Aminoacyl-tRNA
Amino
Arachidonic
Desired Output:
ABC transporters
Alanine, aspartate and glutamate metabolism
alpha-Linolenic acid metabolism
Aminoacyl-tRNA biosynthesis
Amino sugar and nucleotide sugar metabolism
Arachidonic acid metabolism
What I can see is that your columns are delimited by two space.
so with awk
:
awk -F '\\s\\s' '{print $1}'
Since this seems to be a fixed-width column, you can just cut
the corresponding characters. The widest column Alanine, aspartate and glutamate metabolism
seems to be 44 characters wide, so:
$ cut -c1-44 foo
ABC transporters
Alanine, aspartate and glutamate metabolism
alpha-Linolenic acid metabolism
Aminoacyl-tRNA biosynthesis
Amino sugar and nucleotide sugar metabolism
Arachidonic acid metabolism
As the second column obviously repeats the beginning of the first column, I take this as criterion for the cut with sed
, thus it does not depend on the column width:
sed 's/^\(.*\)\(.*\) \1$/\1\2/'
First pattern is the repeated part, backreferenced as \1
at the end of the line. You could add ;s/ *$//
to remove the trailing spaces if they bother you.