"Line contains NULL byte" in CSV reader (Python)

Solution 1:

I've solved a similar problem with an easier solution:

import codecs
csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))

The key was using the codecs module to open the file with the UTF-16 encoding, there are a lot more of encodings, check the documentation.

Solution 2:

I'm guessing you have a NUL byte in input.csv. You can test that with

if '\0' in open('input.csv').read():
    print "you have null bytes in your input file"
else:
    print "you don't"

if you do,

reader = csv.reader(x.replace('\0', '') for x in mycsv)

may get you around that. Or it may indicate you have utf16 or something 'interesting' in the .csv file.

Solution 3:

If you want to replace the nulls with something you can do this:

def fix_nulls(s):
    for line in s:
        yield line.replace('\0', ' ')

r = csv.reader(fix_nulls(open(...)))

Solution 4:

You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.

See the (line.replace('\0','') for line in f) below, also you'll want to probably open that file up using mode rb.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'rb') as mycsv:
        reader = csv.reader( (line.replace('\0','') for line in mycsv) )
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)