How do I get Excel to import a CSV file with commas in some of the content fields?

How do I get Excel to import my CSV file?

File

I have a file claiming to be CSV. It contains 10 fields, all of which are surrounded by double quotes (yes, even the dates and numbers.) 2 or 3 of these fields contain commas. But data in these fields is also surrounded by quotes.

Problem

When I try to open this file, Excel completely ignores the quotes and assumes that they are part of the data.

It therefore splits the fields containing commas into multiple columns.

This is bad, at least for me.

Example data

"20051", "", "2009 Sep 30 02:53:23", "SOMETEXT", "", "5000", "2000000", "2008 - 99999 - -99999, 2008 - unk - unk", "191 - SOMETEXT - SOMETEXT, 192 - SOMETEXT - SOMETEXT, 193 - SOMETEXT - SOMETEXT, 194 - SOMETEXT - SOMETEXT, 195 - SOMETEXT - SOMETEXT, 196 - SOMETEXT - SOMETEXT", "45 - SOMETEXT - SOMETEXT, 1162 - SOMETEXT - SOMETEXT, 140 - SOMETEXT - SOMETEXT, 141 - SOMETEXT - SOMETEXT"
"20052", "SOMETEXT", "2009 Sep 08 07:56:50", "SOMETEXT", "", "50000", "5000000", "2007 - 99999 - 99999", "146 - SOMETEXT - SOMETEXT, 145 - SOMETEXT - SOMETEXT, 147 - SOMETEXT - SOMETEXT, 148 - SOMETEXT - SOMETEXT", "103 - SOMETEXT - SOMETEXT, 1131 - SOMETEXT - SOMETEXT"

What have I tried?

I have used the import facility to specify delimiters and such, but this does not seem to help.

I have tried switching "Text Delimiter" from a double quote to "{none}" and back again. This appears to only affect the first column. This would be resasonable if it were possible to click the other columns and apply this setting to each. But this is not the case, and it seems this is intended to work across columns.

What else can I try?


Solution 1:

I just tried a quick test that replicated your problem.

I created a 1 line CSV in Word (which uses smart quotes) as test.csv “123”,“4,5,6” and it opened in Excel as you described.

Try replacing “ and ” with "


Having played with your sample I notice that Excel does not like the spaces between fields

e.g. instead of

"20051", "", "2009 Sep 30 02:53:23", ...

you want

"20051","","2009 Sep 30 02:53:23",...

a decent Regular Expression replacement should be able to handle it with

Find:    |("[^"]*",) |
Replace: |\1|

(pipe characters for visual cues only)

Or simply modify the .Net code if you have access to it ;-)

Also, as Arjan pointed out, you may also need to convert the file from UTF-8 to ANSI to prevent cell A1 from containing the BOM and its surrounding qoutes.

I have come across the Catch 22 of ANSI encoded CSV not handling international characters and UTF encoded CSV not being propery handled by Excel; and not found a solution while mainting the CSV. If international character support is required, the XML (or native XLS) formats seem the only way to go—at the cost of simplicity.

Solution 2:

This problem has been plaguing me for a number of years. I just discovered the solution and it's alluded to in the answers above but not explicitly spelled out.

It's the space after the comma!

This doesn't import into excel;

HEADER1, HEADER2
"1,000", "2,000"

While

HEADER1,HEADER2
"1,000","2,000"

Works!

Solution 3:

You could also try CSVEd which is free.

Solution 4:

When I try to open this file, Excel completely ignores the quotes and assumes that they are part of the data. It therefore splits the fields containing commas into multiple columns.

If you're not accidentally using smart-quotes like lumbarius suggested, and assuming you're on Windows: that might be due to your regional settings. When double-clicking or using File » Open, then Excel does not ask you for any details, but simply uses these settings.

Still, manual invoking the import function (like you did) should have worked, so I doubt changing the regional settings will help...