Remove duplicates in each line of a file
Solution 1:
Here is an option using awk:
awk '{ while(++i<=NF) printf (!a[$i]++) ? $i FS : ""; i=split("",a); print ""}' infile > outfile
Edit Updated with comments:
-
while (++i<=NF)
Initializes the while loop, precrementing "i" since $0 is the full line in awk.
So it starts at $1 (first field). Loops through the line until the end (less than or equal to 'NF' which is built into awk for "Number of Fields"). The default field separator is a space, you could change the default separator easily.
-
printf (!a[$i]++) ? $i FS : ""
This is a ternary operation.
So, if input is not in the array
!a[$i]++
, then it prints $i, if it is, it prints "". (You could remove the!
and reverse the$i FS : ""
if you don't like it this way). -
i=split("",a)
Normally, that's a null split. In this case, it resets I for the next line.
-
print ""
ends the line for the output (not 100% why, actually), otherwise you would have an output of:
1 2 3 5 4 1 2 3
instead of1 2 3
5 4 1 2 3
Solution 2:
Since ruby
comes with any Linux distribution I know of:
ruby -e 'STDIN.readlines.each { |l| l.split(" ").uniq.each { |e| print "#{e} " }; print "\n" }' < test
Here, test
is the file that contains the elements.
To explain what this command does—although Ruby can almost be read from left to right:
- Read the input (which comes from
< test
through your shell) - Go through each line of the input
- Split the line based on one space separating the items, into an array (
split(" ")
) - Get the unique elements from this array (in-order)
- For each unique element, print it, including a space (
print "#{e} "
) - Print a newline once we're done with the unique elements