Remove duplicates in each line of a file

Solution 1:

Here is an option using awk:

awk '{ while(++i<=NF) printf (!a[$i]++) ? $i FS : ""; i=split("",a); print ""}' infile > outfile

Edit Updated with comments:

  1. while (++i<=NF)

    Initializes the while loop, precrementing "i" since $0 is the full line in awk.

    So it starts at $1 (first field). Loops through the line until the end (less than or equal to 'NF' which is built into awk for "Number of Fields"). The default field separator is a space, you could change the default separator easily.

  2. printf (!a[$i]++) ? $i FS : ""

    This is a ternary operation.

    So, if input is not in the array !a[$i]++, then it prints $i, if it is, it prints "". (You could remove the ! and reverse the $i FS : "" if you don't like it this way).

  3. i=split("",a)

    Normally, that's a null split. In this case, it resets I for the next line.

  4. print ""

    ends the line for the output (not 100% why, actually), otherwise you would have an output of:

    1 2 3 5 4 1 2 3 instead of
    1 2 3
    5 4 1 2 3

Solution 2:

Since ruby comes with any Linux distribution I know of:

ruby -e 'STDIN.readlines.each { |l| l.split(" ").uniq.each { |e| print "#{e} " }; print "\n" }' < test

Here, test is the file that contains the elements.

To explain what this command does—although Ruby can almost be read from left to right:

  • Read the input (which comes from < test through your shell)
  • Go through each line of the input
  • Split the line based on one space separating the items, into an array (split(" "))
  • Get the unique elements from this array (in-order)
  • For each unique element, print it, including a space (print "#{e} ")
  • Print a newline once we're done with the unique elements