Convert a text file from ansi to UTF-8 in windows batch scripting
We have a text file which is in default ANSI format and that needs to be converted into UTF-8 format. Is there any way we can use the general windows DOS commands to convert the file? We can use the PowerShell but only this command line has to be run from a different batch process.
The PowerShell syntax is rather straightforward. This command opens a file in the default encoding and saves it as UTF-8 with BOM:
Get-Content <SrcFile.txt> -Encoding Oem | Out-File <DestFile.txt> -Encoding utf8
The Encoding
parameter accepts the following: Ascii, BigEndianUnicode, BigEndianUTF32, Byte, Default, Oem, String, Unicode, Unknown, UTF32, UTF7, UTF8
Get-Content might be not optimal as it handles the input file line by line (at least, by default, if you don't use the Raw
switch as described later), and may cause changing the line ending (for example, if you move text files between Unix and Windows systems). I had serious problems in a script just because that, and it took about an hour to find the exact reason. See more about that in this post. Due to this behavior, Get-Content is not the best choice as well, if performance matters.
Instead of this, you can use PowerShell in combination of the .NET classes (as long you have a version of the .NET Framework installed on your system):
$sr = New-Object System.IO.StreamReader($infile)
$sw = New-Object System.IO.StreamWriter($outfile, $false, [System.Text.Encoding]::Default)
$sw.Write($sr.ReadToEnd())
$sw.Close()
$sr.Close()
$sw.Dispose()
$sr.Dispose()
Or even more simply, use the Raw
switch as described here to avoid that overhead and read the text in a single block:
Get-Content $inFile -Raw