GNU version of locate - gupdatedb fails with "gfind: failed to read file names from file system"
I try to test GNU version of locate
command. First, I have to create the database like this :
sudo gupdatedb --prunepaths=/Volumes --output=$HOME/locatedb_gupdatedb
Unfortunately, 1 minute after launching the command, I still get the following error :
gfind: failed to read file names from file system at or below '/': No such file or directory
I don't understand where this error could come from ?
UPDATE 1: I replaced gfind
by find
commad and correct path of find
which is /usr/bin/
. Unfortunately, I get embarassing error messages like this when I launch the gupdatedb
command like this :
sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /Volumes /System' --output=$HOME/locatedb_gupdatedb
Here the error messages :
find: /System/Volumes/Data/.Spotlight-V100: No such file or directory
find: /System/Volumes/Data/.PKInstallSandboxManager: No such file or directory
find: /System/Volumes/Data/.PKInstallSandboxManager-SystemSoftware: No such file or directory
find: /System/Volumes/Data/.cleverfiles: No such file or directory
find: /System/Volumes/Data/mnt: No such file or directory
find: /System/Volumes/Data/.DocumentRevisions-V100: No such file or directory
etc ...
I tried to modify into /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
file, the option :
: ${FINDOPTIONS="2 > /dev/null"}
But the issue is this option is set in front of the command find
, not at the end, so it is not correct in the following of the file.
Given the fact there a lot of find
commands after in the script, I can't add manually each time the 2 > /dev/null
terminal option.
Anyone could see how to suppress all these error messages from find
command when I launch a gupdatedb
command ?
UPDATE 2: I finally managed to create a database with gupdatedb (GNU version of MacOS updatedb
command) by doing :
sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /System /Volumes' --output=$HOME/locatedb_gupdatedb
The issue now is, when I do a research on a substring of a file or directory, the informations seems to be duplicated in results (sub_string
is simply the part of a file or directory name) :
For example, if I do a : glocate -d ~/locatedb_gupdatedb sub_string
Then, I have duplicates results like :
/System/Volumes/Data/Users/fab/sub_string.dat
/Users/fab/sub_string.dat
I don't know how to exclude '/System/Volumes/Data/
' from these results : however, I have well specified in --prunepaths
option the directory System
, why isn't it taken into account in database created by gupdatedb
?
Or maybe I should perform a :
sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /System/Volumes/Data /Volumes' --output=$HOME/locatedb_gupdatedb
??
Any help is welcome to exclude this directory
/System/Volumes/Data
from indexing database.
UPDATE 3: Here is an example of quickly generating a database with updatedb on Debian 10 Buster
. Few modifications of normal using have been done between the timestamps of the 2 commands updatedb
.
So I conclude there is a really a difference betweeen GNU/MacOS and GNU/Linux implementation.
Any explanation is welcome.
The problem here is that you are running GNU's updatedb
with the macOS's find
. gupdatedb
is a shell script that expects to be running a GNU-compatible find
. In particular, it converts --prunepaths
to a GNU-compatible basic regular expression.
--prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /Volumes /System'
converts to
PRUNEREGEX="\(^/private/tmp$\)\|\(^/private/var/folders$\)\|\(^/private/var/tmp$\)\|\(^*/Backups.backupdb$\)\|\(^/System$\)\|\(^/Volumes$\)"
GNU BRE (basic regular expressions) treat \|
as an "alternative" operator:
‘foo\|bar’ matches either ‘foo’ or ‘bar’
This is a GNU extension to the BRE syntax, and the macOS find
operator does not accept it. So the whole prune string fails to match anything.
The macOS (and POSIX) BRE does not have an alternative operator, so it is not a super easy fix. The macOS updatedb script converts the alternative into explicit -or
operators in the find
command. You can modify the GNU script to do that. Or get gfind
to work.
By the way, GNU's updatedb script builds a completely new database from scratch on every run, just like macOS's.
Your Debian installation is using mlocate which is a completely different implementation than GNU locate, not as widely ported, and not available on macOS AFAIK. And even though mlocate
runs quickly on your Debian installation, that does not mean it runs a lot faster than GNU locate. Both run in under a second building the entire database from scratch on my Debian installation when all the disk metadata is in RAM (which is usually the case).
Apple provides mdfind
, which uses the incremental database created by mds
, which is largely triggered by file system events. This makes it (theoretically) much more efficient than even mlocate
, which still has to traverse the entire directory structure looking for changed directories. The problem is that incremental builds accumulate errors. That is why locate
, with its full rebuild every time, is still around and preferred by many.
The Apple version is run via /usr/libexec/locate.updatedb
which is a shell script. Some additional paths are excluded from the search there. In Mojave these are
: ${PRUNEPATHS:="/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb"} # unwanted directories
PS: I don't have Catalina so you may need to check whether other paths need to be excluded as well.