Unaccent Special characters and bulk-rename

Background

I have a workflow that collects documents from various sources, converts them to PDF, OCRs them, compresses them, extracts their contents and annotations, uploads the files to a server and makes the corresponding mysql entries in order to provide a full-featured index for my web-based search engine.

To display PDFs within the search engine, I use PDF.js from mozilla, which can, in some cases, not load documents with certain characters in the filename. These critical characters include german umlauts (Ä,Ö,Ü,ä,ö,ü), brackets ((),[],{}), french accents (é,è,à) and spanish accents (Ñ,ñ,Ó,ó,Á,á,É,é,Í,í,Ú,ú).

Every file that gets processed in applescript 'checks in' to php/mysql using curl. It would be no big deal to determine the new filename with PHP, but I have troubles to rename files which have the mentioned characters in their filename with AppleScript to a standardized name.

Question

I would like to implement a function that standardises filenames containing the special characters mentioned above using applescript.

The following filenames should become their corresponding values on the right

  • Riñón.pdf --> Rinon.pdf
  • Ergänzung.pdf --> Ergaenzung.pdf
  • Übersicht.pdf --> Uebersicht.pdf
  • Système impérmeable.pdf --> Systeme impermeable.pdf

In short, german umlauts become expanded (like Ä --> Ae, ü --> ue) and all other accents become their 'unaccented' corresponding value (like ñ --> n, é --> e) and brackets become spaces ((Ergänzung).pdf --> Ergaenzung .pdf)

Thanks for any advice


Solution 1:

Renaming a file in AppleScript can be performed through the Finder:

tell application "Finder"
   set the name of file "Monterey" to "Eden"
end tell

Reducing a filename to a-z is tricky. If you are comfortable with perl, there is an ideal module called Text::Unidecode. Other approaches, such as using regular expressions, is discussed by Perl Monks in removing accents.

You mention using php, so this question may provide easier answers to integrate into your workflow, How do I remove accents from characters in a PHP string? The highest voted answer suggests:

function stripAccents($stripAccents){
  return strtr($stripAccents,'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
}