How to cleanup the graphite whisper's data?

Solution 1:

Currently, deleting files from /opt/graphite/storage/whisper/ is the correct way to clean up whisper data.

As for the tedious side of the process, you could use the find command if there is a certain pattern that your trying to remove.

find /opt/graphite/storage/whisper -name loadavg.wsp -delete

Similar Question on answers.launchpad.net/graphite

Solution 2:

I suppose that this is going into Server Fault territory, but I added the following cron job to delete old metrics of ours that haven't been written to for over 30 days (e.g. of cloud instances that have been disposed):

find /mnt/graphite/storage -mtime +30 | grep -E \
"/mnt/graphite/storage/whisper/collectd/app_name/[^/]*" -o \
| uniq | xargs rm -rf

This will delete directories which have valid data.

First:

find whisperDir -mtime +30 -type f | xargs rm 

And then delete empty dirs

find . -type d -empty | xargs rmdir

This last step should be repeated, because may be new empty directories will be left.

Solution 3:

As people have pointed out, removing the files is the way to go. Expanding on previous answers, I made this script that removes any file that has exceeded its max retention age. Run it as a cronjob fairly regularly.

#!/bin/bash
d=$1
now=$(date +%s)

MINRET=86400

if [ -z "$d" ]; then
  echo "Must specify a directory to clean" >&2
  exit 1
fi

find $d -name '*.wsp' | while read w; do
  age=$((now - $(stat -c '%Y' "$w")))
  if [ $age -gt $MINRET ]; then
    retention=$(whisper-info.py $w maxRetention)
    if [ $age -gt $retention ]; then
      echo "Removing $w ($age > $retention)"
      rm $w
    fi
  fi
done

find $d -empty -type d -delete

A couple of bits to be aware of - the whisper-info call is quite heavyweight. To reduce the number of calls to it I've put the MINRET constant in, so that no file will be considered for deletion until it is 1 day old (24*60*60 seconds) - adjust to fit your needs. There are probably other things that can be done to shard the job or generally improve its efficiency, but I haven't had need to as yet.