VBA : save a file with UTF-8 without BOM
Solution 1:
In the best of all possible worlds the Related list would contain a reference to this question which I found as the first hit for "vbscript adodb.stream bom vbscript site:stackoverflow.com".
Based on the second strategy from boost's answer:
Option Explicit
Const adSaveCreateNotExist = 1
Const adSaveCreateOverWrite = 2
Const adTypeBinary = 1
Const adTypeText = 2
Dim objStreamUTF8 : Set objStreamUTF8 = CreateObject("ADODB.Stream")
Dim objStreamUTF8NoBOM : Set objStreamUTF8NoBOM = CreateObject("ADODB.Stream")
With objStreamUTF8
.Charset = "UTF-8"
.Open
.WriteText "aÃö"
.Position = 0
.SaveToFile "toto.php", adSaveCreateOverWrite
.Type = adTypeText
.Position = 3
End With
With objStreamUTF8NoBOM
.Type = adTypeBinary
.Open
objStreamUTF8.CopyTo objStreamUTF8NoBOM
.SaveToFile "toto-nobom.php", adSaveCreateOverWrite
End With
objStreamUTF8.Close
objStreamUTF8NoBOM.Close
Evidence:
chcp
Active code page: 65001
dir
...
15.07.2015 18:48 5 toto-nobom.php
15.07.2015 18:48 8 toto.php
type toto-nobom.php
aÃö
Solution 2:
I knew that the Scripting File System Object's stream inserted a Byte Order Mark, but I haven't seen that with the ADODB Stream.
Or at least, not yet: I rarely use the ADODB stream object...
But I do remember putting this remark into some code a few years ago:
' **** WHY THIS IS COMMENTED OUT **** **** **** **** **** **** **** ****
'
' Microsoft ODBC and OLEDB database drivers cannot read the field names from
' the header when a unicode byte order mark (&HFF & &HFE) is inserted at the
' start of the text by Scripting.FileSystemObject 'Write' methods. Trying to
' work around this by writing byte arrays will fail; FSO 'Write' detects the
' string encoding automatically, and won't let you hack around it by writing
' the header as UTF-8 (or 'Narrow' string) and appending the rest as unicode
'
' (Yes, I tried some revolting hacks to get around it: don't *ever* do that)
'
' **** **** **** **** **** **** **** **** **** **** **** **** **** **** ****
'
' With FSO.OpenTextFile(FilePath, ForWriting, True, TristateTrue)
' .Write Join(arrTemp1, EOROW)
' .Close
' End With ' textstream object from objFSO.OpenTextFile
'
' **** **** **** **** **** **** **** **** **** **** **** **** **** **** ****
You can tell I was having a bad day.
Next, using prehistoric PUT commands from the days before file-handling had emerged from the primordial C:
' **** WHY WE 'PUT' A BYTE ARRAY INSTEAD OF A VBA STRING VARIABLE **** ****
'
' Put #hndFile, , StrConv(Join(arrTemp1, EOROW), vbUnicode)
' Put #hndFile, , Join(arrTemp1, EOROW)
'
' If you pass unicode, Wide or UTF-16 string variables to PUT, it prepends a
' Unicode Byte Order Mark to the data which, when written to your file, will
' render the field names illegible to Microsoft's JET ODBC and ACE-OLEDB SQL
' drivers (which can actually read unicode field names, if the helpful label
' isn't in the way). However, the 'PUT' statements writes a Byte array as-is
'
' **** **** **** **** **** **** **** **** **** **** **** **** **** **** ****
So there's the code that actually does it:
Dim arrByte() As Byte
Dim strText As String
Dim hndFile As String
strText = "Y'all knew that strings are actually byte arrays?"
arrByte = strText
hndFile = FreeFile
Open FilePath For Binary As #hndFile
Put #hndFile, , arrByte
Close #hndFile
Erase arrByte
I'm assuming that strText is actually UTF-8. I mean, we're in VBA, in Microsoft Office, and we absolutely know that this is always going to be UTF-8, even we use it in a foreign country...
...Right?