Search with diacritics / accents characters with `locate` command

Solution 1:

If we take a look at updatedb.conf(5), we'll find that there is no much we can do with configuration items.

So we are going to write a script using locate; At the end we are able to run something like my-locate.sh liberacion or my-locate.sh liberâciòn and it will brings us all the possible combinations.


Lets start

First create a simple file as our database anywhere you want it to be, e.g: ~/.mydb; then add your accents characters into that file like this:

aâàáäÂÀÂÄ
eêèéëÊÈÉË
iîïíÎÏ
uûùüÛÜÙ
cçÇ
oôöóÔÖóòòò
...
...

Then we need a small script which does the job for us, I wrote a simple one:

#!/bin/bash

# Final search term 
STR=""

# Loop throughout all characters of desired string
for (( i=0; i<${#1}; i++ )); do

  # Split the string in one char
  CH="${1:$i:1}"

  # Find all possible combinations of this char
  CHARS=$(grep "$CH" ~/.mydb)

  # Add an "or" operator between characters
  REG=$(echo "$CHARS" |  sed 's/.\{1\}/&\|/g' )
  REG="($REG)"

  # Append all possible combination of this character
  # to our final search term as an or statement
  if [ "$REG" == '()' ];
  then
   STR=$STR$CH
  else
   STR=$STR$REG
  fi

done

# locate it using regex
locate --regex "$STR$"

Now save it somewhere in your PATH with a desired name, e.g: in ~/bin. It should be already in your PATH environment.

After all simply use something like this to search all possible combinations.

my-locate.sh liberacion

Will find for me all of these:

~/lab/liberacion
~/lab/liberaciòn
~/lab/liberación
~/lab/liberâciòn
~/lab/liberäciòn
~/lab/libÈrâciòn

Solution 2:

Now with mlocate 0.26 we have -t --transliterate option (see the man page) on Ubuntu 18.04+ (without the need of workarounds):

Creating some test files:

$ touch liberación liberacion liberaciôn

Update and search:

$ updatedb
$ locate --transliterate liberacion 
/home/pablo/liberacion
/home/pablo/liberación
/home/pablo/liberaciôn

So now locate -t liberación also search for files with string liberacion and even liberaciòn!

Finally, creating an alias on my .bashrc :-)

$ alias locate="locate --transliterate"