Excel 2007 doesn't preserve asian characters when converting to csv [duplicate]
I am working on a project that uses a CSV file supplied by the client to populate a database. The client is to create the spreadsheet, then save it as a CSV to be uploaded, which is then parsed into a database.
The problem is, whenever the client saves the excel spreadsheet as an MS-DOS .csv file, many of the special characters get converted to question marks "?" (symbols such as ' " / ). However, if we then open the CSV file and manually replace each ? with the right character it works fine.
The problem is that the data file is HUGE, and we can't reasonably do this, so I was hoping there was a way to save it. We have tried exporting as Unicode and ASCII to no avail. We have also tried uploading to google docs, and re-saving, however, it also breaks those characters.
Solution 1:
Ensure you are choosing to save as a CSV (Comma Delimited)
and not a CSV (MS-DOS)
, as DOS doesn't support UTF-8 characters.
Solution 2:
I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.
Solution 3:
A possible workaround is to save it as Unicode Text
(2007 has it, not sure about previous editions), which saves it as a tab-separated text file.
This file preserved my unicode characters (in my case I was working with asian characters) while producing some sort of delimited text file which you can then run through external tools to convert to a csv if necessary.
My input did not have tabs embedded within each cell, however, and I am not sure how that would be handled.
Solution 4:
Here's what works for me:
- Make data corrections in Excel or CSV
- Save file As Unicode Text
- Open NOTEPAD
- Open the Unicode file you just saved using NOTEPAD
- Use your cursor to highlight a blank area that holds a single tab 5a. Use the space between the Acquire Id and the Request Type because this holds ONE TAB!
- Hit Cnrl-C to copy the tab character
- Type Cnlr-H to open the Replace function box
- Click in the Find What text box and type Cnlr-V to paste the Tab
- Click in the Replace With Text box and type a comma
- Click Replace to test it one time. Confirm the tab in the file is replaced with a comma
- Click Replace All
- Click Cancel
- Save the file and Exit
- In Windows Explorer change the file extension to .csv
Solution 5:
I've been having this issue for a while now, and finally dedicated some time to figure it out! I was able to (seemingly) fix the issue by saving as "Windows Comma Separated (.csv)". I tried it from a .xlsx and a .xls, both converted over to a .csv just fine. Hope this helps - let me know if any issues pop up with this method. I'll report back if I see anything over the next few weeks.