How can I view updatedb database content, and then exclude certain files/paths?

The updatedb database on my debian (squeeze) server is quite slow.

  • where is the database located
  • how can I view its content and find out if there are some paths with useless stuff, that I could add to the prunepaths?
  • how can I prune all paths that contain */.git/*, */.svn/* and similar?
  • why don't the files get excluded, I defined in PRUNEPATHS ?

my /etc/updatedb.conf looks like this:

...
# filesystems which are pruned from updatedb database
PRUNEFS="NFS nfs nfs4 afs binfmt_misc proc smbfs autofs iso9660 ncpfs coda devpts ftpfs devfs mfs shfs sysfs cifs lustre_lite tmpfs usbfs udf"
export PRUNEFS
# paths which are pruned from updatedb database
PRUNEPATHS="/tmp /usr/tmp /var/tmp /afs /amd /alex /var/spool /sfs /media /var/backups/rsnapshot /var/mod_pagespeed/"
...

EDIT:

  • The locate database is in /var/cache/locate/locatedb
  • locate / will list all files and directories in the database (I looked through the results by exporting it in a file: locate />/tmp/locatedb.txt, download this txt-file and find large amount of useless stuff)

Solution 1:

You are probably using the GNU findutils version of locate, which doesn't support the PRUNENAMES option. Installing mlocate will provide these configuration options:

apt-get remove locate
mv /etc/updatedb.conf /etc/updatedb.conf-GNU.old
apt-get install mlocate

Now with the mlocate packge you can edit or create /etc/updatedb.conf and add these lines:

PRUNENAMES=".git .bzr .hg .svn"
PRUNEPATHS="/tmp /var/spool /var/cache /media /usr/tmp /var/tmp /sfs /afs /amd /alex /var/backups/rsnapshot /var/mod_pagespeed"
# the paths in `PRUNEPATHS` must be without trailing slashes

Then actualize the database with:

updatedb

You probably can remove the huge old locate database:

rm /var/cache/locate/locatedb

(The mlocate database is stored at /var/lib/mlocate/mlocate.db)

Check out https://apps.ubuntu.com/cat/applications/mlocate/ for more information about the package.

(I spent a ridiculous amount of time trying to solve a similar issue!)

Solution 2:

Use PRUNENAMES as stated in man updatedb.conf

A whitespace-separated list of directory names (without paths) which should not be scanned by updatedb(8). By default, no directory names are skipped.

The use of

PRUNENAMES=".git .hg .svn"

should do the trick (above line is the standard value on Fedora 18).

Solution 3:

locate / will list all files and directories in the database.

Solution 4:

why don't the files get excluded, I defined in PRUNEPATHS

Although the OP's problem ended up being version/PRUNENAMES, as an alternative/addition to trolling through locate db output, running updatedb manually with the --debug-pruning flag prints the individual pruning decisions to stderr, and is really useful for tracking down pruning problems

For eg stick it into a file (as root in this case):

updatedb --debug-pruning > ~/updatedb_debug.log 2>&1 &

Sample output:

Matching bind_mount_paths:
...done
Checking whether filesystem `/boot' is excluded:
 `/', type `rootfs'
 `/proc', type `proc'
 => type matches, dir `/proc'
 `/run', type `tmpfs'
...
Checking whether filesystem `/mnt/windows' is excluded:
Checking whether filesystem `/proc' is excluded:
Checking whether filesystem `/run' is excluded:
...
Skipping `/dev/mqueue': in prunefs
Skipping `/dev/pts': in prunefs

etc

(Am using mlocate)