Excel destroying special character when saved as CSV
I am working on a project that uses a CSV file supplied by the client to populate a database. The client is to create the spreadsheet, then save it as a CSV to be uploaded, which is then parsed into a database.
The problem is, whenever the client saves the excel spreadsheet as an MS-DOS .csv file, many of the special characters get converted to question marks "?" (symbols such as ' " / ). However, if we then open the CSV file and manually replace each ? with the right character it works fine.
The problem is that the data file is HUGE, and we can't reasonably do this, so I was hoping there was a way to save it. We have tried exporting as Unicode and ASCII to no avail. We have also tried uploading to google docs, and re-saving, however, it also breaks those characters.
Ensure you are choosing to save as a CSV (Comma Delimited)
and not a CSV (MS-DOS)
, as DOS doesn't support UTF-8 characters.
I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.
A possible workaround is to save it as Unicode Text
(2007 has it, not sure about previous editions), which saves it as a tab-separated text file.
This file preserved my unicode characters (in my case I was working with asian characters) while producing some sort of delimited text file which you can then run through external tools to convert to a csv if necessary.
My input did not have tabs embedded within each cell, however, and I am not sure how that would be handled.
Here's what works for me:
- Make data corrections in Excel or CSV
- Save file As Unicode Text
- Open NOTEPAD
- Open the Unicode file you just saved using NOTEPAD
- Use your cursor to highlight a blank area that holds a single tab 5a. Use the space between the Acquire Id and the Request Type because this holds ONE TAB!
- Hit Cnrl-C to copy the tab character
- Type Cnlr-H to open the Replace function box
- Click in the Find What text box and type Cnlr-V to paste the Tab
- Click in the Replace With Text box and type a comma
- Click Replace to test it one time. Confirm the tab in the file is replaced with a comma
- Click Replace All
- Click Cancel
- Save the file and Exit
- In Windows Explorer change the file extension to .csv