How to write UTF-8 characters using bulk insert in SQL Server?

Solution 1:

I came here before looking for a solution for bulk inserting special characters. Didn't like the workaround with UTF-16 (that would double the size of csv file). I found out that you definitely CAN and it's very easy, you don't need a format file. This answer is for other people who are looking for the same, since it doesn't seem to be documented well anywhere, and I believe this is a very common issue for non-english speaking people. The solution is: just add CODEPAGE='65001' inside the with statement of the bulk insert. (65001=codepage number for UTF-8). Might not work for all unicode characters as suggested by Michael O, but at least it works perfect for latin-extended, greek and cyrillic, probably many others too.

Note: MSDN documentation says utf-8 is not supported, don't believe it, for me this works perfect in SQL server 2008, didn't try other versions however.

e.g.:

BULK INSERT #myTempTable 
FROM  'D:\somefolder\myCSV.txt'+
WITH 
    ( 
        CODEPAGE = '65001',
        FIELDTERMINATOR = '|',
        ROWTERMINATOR ='\n'
    );

If all your special characters are in 160-255 (iso-8859-1 or windows-1252), you could also use:

BULK INSERT #myTempTable 
FROM  'D:\somefolder\myCSV.txt'+
WITH 
    ( 
        CODEPAGE = 'ACP',
        FIELDTERMINATOR = '|',
        ROWTERMINATOR ='\n'
    );

Solution 2:

You can't. You should first use a N type data field, convert your file to UTF-16 and then import it. The database does not support UTF-8.