Unaccent Special characters and bulk-rename
Background
I have a workflow that collects documents from various sources, converts them to PDF, OCRs them, compresses them, extracts their contents and annotations, uploads the files to a server and makes the corresponding mysql entries in order to provide a full-featured index for my web-based search engine.
To display PDFs within the search engine, I use PDF.js from mozilla, which can, in some cases, not load documents with certain characters in the filename. These critical characters include german umlauts (Ä,Ö,Ü,ä,ö,ü), brackets ((),[],{}), french accents (é,è,à) and spanish accents (Ñ,ñ,Ó,ó,Á,á,É,é,Í,í,Ú,ú).
Every file that gets processed in applescript 'checks in' to php/mysql using curl. It would be no big deal to determine the new filename with PHP, but I have troubles to rename files which have the mentioned characters in their filename with AppleScript to a standardized name.
Question
I would like to implement a function that standardises filenames containing the special characters mentioned above using applescript.
The following filenames should become their corresponding values on the right
- Riñón.pdf --> Rinon.pdf
- Ergänzung.pdf --> Ergaenzung.pdf
- Übersicht.pdf --> Uebersicht.pdf
- Système impérmeable.pdf --> Systeme impermeable.pdf
In short, german umlauts become expanded (like Ä --> Ae, ü --> ue) and all other accents become their 'unaccented' corresponding value (like ñ --> n, é --> e) and brackets become spaces ((Ergänzung).pdf --> Ergaenzung .pdf)
Thanks for any advice
Solution 1:
Renaming a file in AppleScript can be performed through the Finder:
tell application "Finder"
set the name of file "Monterey" to "Eden"
end tell
Reducing a filename to a-z is tricky. If you are comfortable with perl
, there is an ideal module called Text::Unidecode. Other approaches, such as using regular expressions, is discussed by Perl Monks in removing accents.
You mention using php
, so this question may provide easier answers to integrate into your workflow, How do I remove accents from characters in a PHP string? The highest voted answer suggests:
function stripAccents($stripAccents){
return strtr($stripAccents,'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
}