How to delete duplicate lines in a file without sorting it in Unix
Solution 1:
awk '!seen[$0]++' file.txt
seen
is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0]
will evaluate to false. The !
is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.
The ++
increments seen
so that seen[$0] == 1
after the first time a line is found and then seen[$0] == 2
, and so on.
AWK evaluates everything but 0
and ""
(empty string) to true. If a duplicate line is placed in seen
then !seen[$0]
will evaluate to false and the line will not be written to the output.
Solution 2:
From http://sed.sourceforge.net/sed1line.txt: (Please don't ask me how this works ;-) )
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'
# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
Solution 3:
Perl one-liner similar to jonas's AWK solution:
perl -ne 'print if ! $x{$_}++' file
This variation removes trailing white space before comparing:
perl -lne 's/\s*$//; print if ! $x{$_}++' file
This variation edits the file in-place:
perl -i -ne 'print if ! $x{$_}++' file
This variation edits the file in-place, and makes a backup file.bak
:
perl -i.bak -ne 'print if ! $x{$_}++' file
Solution 4:
An alternative way using Vim (Vi compatible):
Delete duplicate, consecutive lines from a file:
vim -esu NONE +'g/\v^(.*)\n\1$/d' +wq
Delete duplicate, nonconsecutive and nonempty lines from a file:
vim -esu NONE +'g/\v^(.+)$\_.{-}^\1$/d' +wq