How to prevent grep from printing the same string multiple times?
If I grep a file containing the following:
These are words
These are words
These are words
These are words
...for the word These
, it will print the string These are words
four times.
How can I prevent grep from printing recurring strings more than once? Otherwise, how can I manipulate the output of grep to remove duplicate lines?
The Unix philosophy is to have tools that do one thing and do them well. In this case, grep
is the tool that selects text from a file. To find out if there are duplicates, one sorts the text. To remove the duplicates, one uses the -u
option to sort
. Thus:
grep These filename | sort -u
sort
has many options: see man sort
. If you want to count duplicates or have a more complicated scheme for determining what is or is not a duplicate, then pipe the sort output to uniq
: grep These filename | sort | uniq
and see man
uniq` for options.
Using grep
and an additional switch, if you are looking for only a single string
grep -m1 'These' filename
From man grep
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless
of the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count greater
than NUM. When the -v or --invert-match option is also used, grep
stops after outputting NUM non-matching lines.
or using awk
;)
awk '/These/ {print; exit}' foo