How can I stop Excel from eating my delicious CSV files and excreting useless data?

I have a database which tracks sales of widgets by serial number. Users enter purchaser data and quantity, and scan each widget into a custom client program. They then finalize the order. This all works flawlessly.

Some customers want an Excel-compatible spreadsheet of the widgets they have purchased. We generate this with a PHP script which queries the database and outputs the result as a CSV with the store name and associated data. This works perfectly well too.

When opened in a text editor such as Notepad or vi, the file looks like this:

"Account Number","Store Name","S1","S2","S3","Widget Type","Date"
"4173","SpeedyCorp","268435459705526269","","268435459705526269","848 Model Widget","2011-01-17"

As you can see, the serial numbers are present (in this case twice, not all secondary serials are the same) and are long strings of numbers. When this file is opened in Excel, the result becomes:

Account Number  Store Name  S1           S2  S3           Widget Type       Date
4173            SpeedyCorp  2.68435E+17      2.68435E+17  848 Model Widget  2011-01-17

As you may have observed, the serial numbers are enclosed by double quotes. Excel does not seem to respect text qualifiers in .csv files. When importing these files into Access, we have zero difficulty. When opening them as text, no trouble at all. But Excel, without fail, converts these files into useless garbage. Trying to instruct end users in the art of opening a CSV file with a non-default application is becoming, shall we say, tiresome. Is there hope? Is there a setting I've been unable to find? This seems to be the case with Excel 2003, 2007, and 2010.


But Excel, without fail, converts these files into useless garbage.

Excel is useless garbage.

Solution

I would be a little surprised if any client wanting your data in an Excel format was unable to change the visible formatting on those three columns to "Number" with zero decimal places or to "text." But let's assume that a short how-to document is out of the question.

Your options are:

  1. Toss a non numeric, not whitespace character into your serial numbers.
  2. Write out an xls file or xlsx file with some default formatting.
  3. Cheat and output those numbers as formulas ="268435459705526269","",="268435459705526269" (you can also do ="268435459705526269",,="268435459705526269" saving yourself 2 characters). This has the advantage of displaying correctly, and probably being generally useful, but subtly broken (as they are formulas).

Be careful with option 3, because some programs (including Excel & Open Office Calc), will no longer treat commas inside ="" fields as escaped. That means ="abc,xyz" will span two columns and break the import.

Using the format of "=""abc,xy""" solves this problem, but this method still limits you to 255 characters because of Excel's formula length limit.


We had a similar problem where we had CSV files with columns containing ranges such as 3-5 and Excel would always convert them to dates e.g. 3-5 would be 3 Mar, after which switching back to numeric gave us a useless date integer. We got around it by

  1. Renaming the CSV to TXT extension
  2. Then when we opened it in Excel, this would kick in the text import wizard
  3. In Step 3 of 3 in the wizard we told it the columns in question were text and they imported properly.

You could do the same here I would think.

text import wizard

Cheers


Better solution is to generate XML Workbook. Like this:

<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
  </OfficeDocumentSettings>

  <ss:Worksheet ss:Name="Sheet 1">
    <Table>
    <Column ss:Width="100"/>
    <Column ss:Width="100"/>
    <Column ss:Width="150"/>
    <Column ss:Width="150"/>
    <Column ss:Width="150"/>
    <Column ss:Width="150"/>
    <Column ss:Width="80"/>
    <Column/>

    <Row>
      <Cell><Data ss:Type="String">Account Number</Data></Cell>
      <Cell><Data ss:Type="String">Store Name</Data></Cell>
      <Cell><Data ss:Type="String">S1</Data></Cell>
      <Cell><Data ss:Type="String">S2</Data></Cell>
      <Cell><Data ss:Type="String">S3</Data></Cell>
      <Cell><Data ss:Type="String">Widget Type</Data></Cell>
      <Cell><Data ss:Type="String">Date</Data></Cell>
    </Row>

    <Row>
      <Cell><Data ss:Type="String">4173</Data></Cell>
      <Cell><Data ss:Type="String">SpeedyCorp</Data></Cell>
      <Cell><Data ss:Type="String">268435459705526269</Data></Cell>
      <Cell><Data ss:Type="String">x</Data></Cell>
      <Cell><Data ss:Type="String">268435459705526269</Data></Cell>
      <Cell><Data ss:Type="String">848 Model Widget</Data></Cell>
      <Cell><Data ss:Type="String">2011-01-17</Data></Cell>
    </Row>


    </Table>
    <x:WorksheetOptions/>
  </ss:Worksheet>
</Workbook>

The file must have .xml extension. Excel and OpenOffice open it correctly.


My solution: I've got the same issue with importing serial numbers. They don't have to be treated as numbers, ie no mathematical functions are performed on it, but we need the entire number in there. The simplest thing I have is to insert a space in the serial number. eg "12345678 90123456 1234". When Excel imports it, it will be treated as text instead of a numeric.