How to delete words from txt file, that exists on another txt file?
Solution 1:
There is a command to do this: comm
. As stated in man comm
, it is plain simple:
comm -3 file1 file2
Print lines in file1 not in file2, and vice versa.
Note that comm
expects files contents to be sorted, so You must sort them before calling comm
on them, just like that:
sort unsorted-file.txt > sorted-file.txt
So to sum up:
sort a.txt > as.txt
sort b.txt > bs.txt
comm -3 as.txt bs.txt > result.txt
After above commands, You will have expected lines in the result.txt
file.
Solution 2:
Here is a short python3 script, based on Germar's answer, which should accomplish this while retaining b.txt
's unsorted order.
#!/usr/bin/python3
with open('a.txt', 'r') as afile:
a = set(line.rstrip('\n') for line in afile)
with open('b.txt', 'r') as bfile:
for line in bfile:
line = line.rstrip('\n')
if line not in a:
print(line)
# Uncomment the following if you also want to remove duplicates:
# a.add(line)
Solution 3:
#!/usr/bin/env python3
with open('a.txt', 'r') as f:
a_txt = f.read()
a = a_txt.split('\n')
del(a_txt)
with open('b.txt', 'r') as f:
while True:
b = f.readline().strip('\n ')
if not len(b):
break
if not b in a:
print(b)