Python - Convert .txt file to .xls or .xlsx

I have data which came in the form .data, so I have converted it to .txt files due to opening in it Microsoft Excel not fully loading it. There are over 2 million rows.

For this reason, I decided to try converted .txt to .xls or .xlsx using python with this script:

import csv
import openpyxl

input_file = 'path/to/inputfile.txt'
output_file = 'path/to/outputfile.xls'

wb = openpyxl.Workbook()
ws = wb.worksheets[0]

with open(input_file, 'rb') as data:
    reader = csv.reader(data, delimiter='\t')
    for row in reader:
        ws.append(row)

wb.save(output_file)

but I am getting the error for row in reader: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

txt data format


You have to set the correct mode in the second parameter when opening the file.

With rb you are opening it in binary mode, but here you should write r to use text mode.

So your code should be:

import csv
import openpyxl

input_file = 'path/to/inputfile.txt'
output_file = 'path/to/outputfile.xls'

wb = openpyxl.Workbook()
ws = wb.worksheets[0]

with open(input_file, 'r') as data:  # read in text mode
    reader = csv.reader(data, delimiter='\t')
    for row in reader:
        ws.append(row)

wb.save(output_file)

As already mentioned in a comment, Excel is not suitable for this amount of data as its limited to 1048576 rows, but gets quite slow to handle even below that. You should really try to import as csv or directly as tsv.