How do I remove non-ASCII characters from filenames?
I have several files with names containing various Unicode characters. I'd like to rename them to only contain the "printable" ASCII characters (32-126).
E.g,
Läsmig.txt //Before
L_smig.txt //After
Mike’s Project.zip
Mike_s Project.zip
Or for bonus points, transcribe to the closest character
Läsmig.txt
Lasmig.txt
Mike’s Project.zip
Mike's Project.zip
Ideally looking for an answer that doesn't require 3rd party tools.
(Edit: Scripts encouraged; I'm just trying to avoid niche shareware apps that need to be installed to work)
Power shell snippet that finds the files I'm interested in renaming:
gci -recurse | where {$_.Name -match "[^\u0020-\u007E]"}
Unanswered similar python question - https://stackoverflow.com/questions/17870055/how-to-rename-a-file-with-non-ascii-character-encoding-to-ascii
I found a similar topic here on Stack Overflow.
With the following code most of the characters will be translated to their "closest character". Although i couldn't get the ’
translated. (Maybe it does, i can't make a filename in the prompt with it ;) The ß
also does not get translated.
function Remove-Diacritics {
param ([String]$src = [String]::Empty)
$normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
$sb = new-object Text.StringBuilder
$normalized.ToCharArray() | % {
if( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
[void]$sb.Append($_)
}
}
$sb.ToString()
}
$files = gci -recurse | where {$_.Name -match "[^\u0020-\u007F]"}
$files | ForEach-Object {
$newname = Remove-Diacritics $_.Name
if ($_.Name -ne $newname) {
$num=1
$nextname = $_.Fullname.replace($_.Name,$newname)
while(Test-Path -Path $nextname)
{
$next = ([io.fileinfo]$newname).basename + " ($num)" + ([io.fileinfo]$newname).Extension
$nextname = $_.Fullname.replace($_.Name,$next)
$num+=1
}
echo $nextname
ren $_.Fullname $nextname
}
}
Edit:
I added some code to check if a filename already exists and add (1)
, (2)
etc... if it does. (It's not smart enough to detect an already existing (1)
in the filename to be renamed so in that case you would get (1) (1)
. But as always... everything is programmable ;)
Edit 2:
Here is the last one for tonight...
This one has a different function for replacing the characters. Also added a line to change unknown characters like ß
and ┤
for example to _
.
function Convert-ToLatinCharacters {
param([string]$inputString)
[Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($inputString))
}
$files = gci -recurse | where {$_.Name -match "[^\u0020-\u007F]"}
$files | ForEach-Object {
$newname = Convert-ToLatinCharacters $_.Name
$newname = $newname.replace('?','_')
if ($_.Name -ne $newname) {
$num=1
$nextname = $_.Fullname.replace($_.Name,$newname)
while(Test-Path -Path $nextname)
{
$next = ([io.fileinfo]$newname).basename + " ($num)" + ([io.fileinfo]$newname).Extension
$nextname = $_.Fullname.replace($_.Name,$next)
$num+=1
}
echo $nextname
ren $_.Fullname $nextname
}
}
I believe this will work...
$Files = gci | where {$_.Name -match "[^\u0020-\u007F]"}
$Files | ForEach-Object {
$OldName = $_.Name
$NewName = $OldName -replace "[^\u0020-\u007F]", "_"
ren $_ $NewName
}
I don't have that range of ASCII filenames to test against though.