Reading CSV with Separator in column values
on_bad_lines
deprecates error_bad_lines
, so if you're on an older version of pandas
, you can just use that:
pd.read_csv("data.csv", sep = "|", error_bad_lines = False)
If you want to keep bad lines, you can also use warn_bad_lines
, extract bad lines from the warnings and read them separately in a single column:
import contextlib
with open('log.txt', 'w') as log:
with contextlib.redirect_stderr(log):
df = pd.read_csv('data.csv', sep = '|', error_bad_lines = False, warn_bad_lines = True)
with open('log.txt') as f:
f = f.readlines()
bad_lines = [int(x[0]) - 1 for x in f[0].split('line ')[1:]]
df_bad_lines = pd.read_csv('data.csv', skiprows = lambda x: x not in bad_lines, squeeze = True, header = None)