Excel 2007 doesn't preserve asian characters when converting to csv [duplicate]

I am working on a project that uses a CSV file supplied by the client to populate a database. The client is to create the spreadsheet, then save it as a CSV to be uploaded, which is then parsed into a database.

The problem is, whenever the client saves the excel spreadsheet as an MS-DOS .csv file, many of the special characters get converted to question marks "?" (symbols such as ' " / ). However, if we then open the CSV file and manually replace each ? with the right character it works fine.

The problem is that the data file is HUGE, and we can't reasonably do this, so I was hoping there was a way to save it. We have tried exporting as Unicode and ASCII to no avail. We have also tried uploading to google docs, and re-saving, however, it also breaks those characters.


Solution 1:

Ensure you are choosing to save as a CSV (Comma Delimited) and not a CSV (MS-DOS), as DOS doesn't support UTF-8 characters.

Solution 2:

I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.

Solution 3:

A possible workaround is to save it as Unicode Text (2007 has it, not sure about previous editions), which saves it as a tab-separated text file.

This file preserved my unicode characters (in my case I was working with asian characters) while producing some sort of delimited text file which you can then run through external tools to convert to a csv if necessary.

My input did not have tabs embedded within each cell, however, and I am not sure how that would be handled.

Solution 4:

Here's what works for me:

  1. Make data corrections in Excel or CSV
  2. Save file As Unicode Text
  3. Open NOTEPAD
  4. Open the Unicode file you just saved using NOTEPAD
  5. Use your cursor to highlight a blank area that holds a single tab 5a. Use the space between the Acquire Id and the Request Type because this holds ONE TAB!
  6. Hit Cnrl-C to copy the tab character
  7. Type Cnlr-H to open the Replace function box
  8. Click in the Find What text box and type Cnlr-V to paste the Tab
  9. Click in the Replace With Text box and type a comma
  10. Click Replace to test it one time. Confirm the tab in the file is replaced with a comma
  11. Click Replace All
  12. Click Cancel
  13. Save the file and Exit
  14. In Windows Explorer change the file extension to .csv

Solution 5:

I've been having this issue for a while now, and finally dedicated some time to figure it out! I was able to (seemingly) fix the issue by saving as "Windows Comma Separated (.csv)". I tried it from a .xlsx and a .xls, both converted over to a .csv just fine. Hope this helps - let me know if any issues pop up with this method. I'll report back if I see anything over the next few weeks.