How can I remove IP addresses from log files after some time
PCRE! (Perl-Compatible Regular Expression)
s/\b(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\b/REMOVED IP/g
Use that as a filter in a perl script or any other suitable language (quite a few use PCRE or some other close-enough regex language that will work) to rewrite your log files at 7 days.
$ cat > file_with_ip
some text from 192.168.1.1
^D
$ perl -p -i -e 's/\b(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\b/REMOVED IP/g' file_with_ip
$ cat file_with_ip
some text from REMOVED IP
On Ubuntu > 12.04
/ apache 2.4
, with default config you could use something like this:
for file in `find /var/log/apache2 -type f -name ".*gz" ! -name "*.ano.*" -mtime +7`
do
datestamp=`date +"%Y%m%d%H%M%s"`
# echo Process $file
zcat $file |sed -E "s/([0-9]{1,3}\.[0-9]{1,3})\.[0-9]{1,3}\.[0-9]{1,3}/\1.0.0/"|gzip > ${file%.*}.ano.${datestamp}.gz
# rm -f $file # Only call this if you are sure that the command before succeeds, otherwise you will lose data.
done
This creates a copy of all *.gz
files older then 7 days and replaces the last two bytes of all IPs
0.0
in the copied version with ano
suffix added.
If you don't use compression or different compression like bz2
you have to change the commands accordingly, e.g. zcat
-> bzcat
.
Finally you can call this routine via cron
once per day/week.