Deleting Duplicated Lines In TEXT File?
I am trying to cleanup a text and for some reason every line duplicated 3 times am i able to get ride of duplicates with regex or tricks or do you know a software which could do that , text file is like this
Party Started 10:17 (89/1/2)
Party Started 10:17 (89/1/2)
Party Started 10:17 (89/1/2)
Jessica At Dinner 17:54 (89/1/2)
Jessica At Dinner 17:54 (89/1/2)
Jessica At Dinner 17:54 (89/1/2)
How can i clean it up , and get ride of duplicated lines , it's about 69,587 lines
Solution 1:
You could use uniq
, standard with bash.
Just type:
uniq filewithdup.txt > filenew.txt
Solution 2:
Since you mention MS Office, I'll give you a native Windows solution.
If you are using Windows Vista or later, there's Windows PowerShell built in. You can use the command Get-Unique
:
The Get-Unique cmdlet compares each item in a sorted list to the next item, eliminates duplicates, and returns only one instance of each item. The list must be sorted for the cmdlet to work properly.
Get-Content input.txt | Get-Unique | Set-Content output.txt
If it's not sorted, you can use Sort-Object -Unique
(it also works on already sorted input, but do not use if you do not wish to remove duplicates with other lines between them).
Get-Content input.txt | Sort-Object -Unique | Set-Content output.txt
Solution 3:
Regex was tagged, so:
/(.+)\n\1/g