How to pass UTF-8 characters to clip.exe with PowerShell without conversion to another charset?
I'm a Windows and Powershell noobie. I'm coming from Linux Land. I used to have this little Bash function in my .bashrc
that would copy a "shruggie" (¯\_(ツ)_/¯
) to the clipboard for me so that I could paste it into conversations on Slack and such.
My Bash alias looked like this: alias shruggie='printf "¯\_(ツ)_/¯" | xclip -selection c && echo "¯\_(ツ)_/¯"'
I realize that this question is juvenile, but the answer does have value to me as I'm sure that I will need to pipe odd UTF-8 characters to output in a Powershell script at some point in the future.
I wrote this function in my PowerShell profile:
function shruggie() {
'¯\_(ツ)_/¯' | clip
Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
However, this gives me: ??\_(???)_/??
(Unknown UTF-8 chars are converted to ?
) when I call it on the command line.
I've looked at [System.Text.Encoding]::UTF8
and some other questions but I don't know how to cast my string as UTF-8 and pass that through clip.exe
and receive UTF-8 out on the other side (on the clipboard).
Solution 1:
There are two distinct, independent aspects:
- copying
¯\_(ツ)_/¯
to the clipboard, usingclip.exe
- writing (echoing)
¯\_(ツ)_/¯
to the console
Prerequisite: PowerShell must properly recognize your source code's encoding in order for the solutions below to work: if your source code is UTF-8-encoded, be sure to save the enclosing files as UTF-8 with BOM for Windows PowerShell to recognize it.
Windows PowerShell, in the absence of BOM, interprets source as "ANSI"-encoded, referring to the legacy, single-byte, extended-ASCII code page in effect, such as Windows-1252 on US-English system, and would therefore interpret UTF-8-encoded source code incorrectly.
Note that, by contrast, PowerShell Core uses UTF-8 as the default, so the BOM is no longer necessary (but still recognized).
Copying ¯\_(ツ)_/¯
to the clipboard, using clip.exe
:
-
In Windows PowerShell v5.1+, you can use the built-in
Set-Clipboard
cmdlet to copy text to the clipboard from within PowerShell; given that PowerShell uses the .NETSystem.String
type that is capable of representing all Unicode characters, there are no encoding issues.- Note that PowerShell Core, even when run on Windows, does NOT have this cmdlet (as of PowerShell Core v6.0.0-rc.2)
- See this answer of mine for clipboard functions that work in earlier PowerShell versions as well as in PowerShell Core.
In earlier versions of Windows PowerShell and in PowerShell Core, use of
clip.exe
is a viable alternative, but its use requires additional work:
function shruggie() {
$OutputEncoding = (New-Object System.Text.UnicodeEncoding $False, $False).psobject.BaseObject
'¯\_(ツ)_/¯' | clip
Write-Verbose -Verbose "Shruggie copied to clipboard." # see section about console output
}
-
New-Object System.Text.UnicodeEncoding $False, $False
creates a BOM-less UTF16-LE encoding, whichclip.exe
understands.- The magic
.psobject.BaseObject
incantation is, unfortunately, required to work around a bug; in PSv5+, you can bypass this bug by using the following instead:[System.Text.UnicodeEncoding]::new($False, $False)
- The magic
Assigning that encoding to preference variable
$OutputEncoding
ensures that PowerShell uses that encoding to pipe data to external utilityclip.exe
.
Writing ¯\_(ツ)_/¯
to the console:
Note: PowerShell Core on Unix platforms generally uses consoles (terminals) with a default encoding of (BOM-less) UTF-8, so no additional work is needed there.
To merely echo (print) Unicode characters (beyond the 8-bit range), it is sufficient to switch to a font that can display Unicode characters (beyond the extended ASCII range), because, as PetSerAl points out, PowerShell uses the Unicode version of the WriteConsole
Windows API function to print to the console.
To support (most) Unicode characters, you most switch to one of the "TT" (TrueType) fonts.
PetSerAl points out in a comment that console windows on Windows are currently limited to a single 16-bit code unit per output character (cell); given that only (most of) the characters in the BMP (Basic Multilingual Plane) are self-contained 16-bit code units, the (rare) characters beyond the BMP cannot be represented.
Sadly, even that may not be enough for some (BMP) Unicode characters, given that the Unicode standard is versioned and font representations / implementations may lag.
Indeed, as of Windows 10 release ID 1703, only a select few fonts can render ツ
(Unicode character KATAKANA LETTER TU
, U+30C4
, UTF-8: E3 83 84
):
MS Gothic
NSimSum
Note that if you want to (also) change how other applications interpret such output, you must again set $OutputEncoding
:
For instance, to make PowerShell expect UTF-8 input from external utilities as well as output UTF-8-encoded data to external utilities, use the following:
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
The above implicitly changes the code page to 65001
(UTF-8), as reflected in chcp
(chcp.com
).
Note that, for backward compatibility, Windows console windows still default to the single-byte, extended-ASCII legacy OEM code page, such as 437
on US-English systems.
Unfortunately, as of v6.0.0-rc.2, this also applies to PowerShell Core, even though it has otherwise switched to BOM-less UTF-8 as the default encoding, as also reflected in $OutputEncoding
.
Solution 2:
If you cannot use PowerShell 5's Set-Clipboard
function (which is IMO the go-to solution) you can convert/encode your output in a way that clip.exe
understands it correctly.
There are two ways to achieve what want here:
-
Feed clip.exe with a UTF-16 file:
clip < UTF16-Shruggie.txt
The important part here is to save the file encoded as:Unicode
(which means UTF-16 format little-endian byte order with BOM) - Encode the string appropriately (the following part works in a PoSh editor like ISE but unfortunately not in a regular console, see mklment0s answer how to achieve this):
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
function shruggie() {
[System.Text.Encoding]::Default.GetString(
[System.Text.Encoding]::UTF8.GetBytes('¯\_(ツ)_/¯')
) | clip.exe
Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
shruggie
This works for me. Here is an MSDN blog post that gives further explanations about $OutputEncoding
/[Console]::OutputEncoding
.
Solution 3:
The post Set-Clipbord option is the most direct answer, but as noted a PoSHv5 and higher thing. However, depending on what OS he the OP is on, not all cmdlets are available on all OS/PoSH versions. This is not to say that Set-Clipboard is not, but since the OP says they're new, it's just a heads up.
If you can't go there for whatever reason, you can create your own and or use add-on modules. See this post:
Convert Keith Hill's PowerShell Get-Clipboard and Set-Clipboard to a PSM1 script
The results from using the Set-Clipboard function from the above post and modifying the OP's post for its use:
(Get-CimInstance -ClassName Win32_OperatingSystem).Caption
Microsoft Windows Server 2012 R2 Standard
$PSVersionTable
Name Value
---- -----
PSVersion 4.0
WSManStackVersion 3.0
SerializationVersion 1.1.0.1
CLRVersion 4.0.30319.42000
BuildVersion 6.3.9600.18773
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0}
PSRemotingProtocolVersion 2.2
function Set-ClipBoard
{
Param
(
[Parameter(ValueFromPipeline=$true)]
[string] $text
)
Add-Type -AssemblyName System.Windows.Forms
$tb = New-Object System.Windows.Forms.TextBox
$tb.Multiline = $true
$tb.Text = $text
$tb.SelectAll()
$tb.Copy()
}
function New-Shruggie
{
Set-ClipBoard -text '¯\_(ツ)_/¯'
Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
New-Shruggie
¯\_(ツ)_/¯ copied to clipboard.
Results pasted from clipboard
¯\_(ツ)_/¯
There are options however, such as the following, but the above are still the best route.
First remember that output is controlled by the OS codepage and the interpreter (PoSH) and both default to ASCII.
You can see the PoSH default CP settings by looking at the output of the built-in variable
$OutputEncoding
As per the PoSH creator Jeffery Snover says:
The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.
Some do, most don’t.
So, all that being said ... You can change the CodePage, by doing items like...
[Console]::OutputEncoding
Or ...
$OutputEncoding = New-Object -typename System.Text.UTF8Encoding
If sending out put to a file...
$OutPutData | Out-File $outFile -Encoding UTF8