Deleting everything except newest files

Lets say I have a directory ḟoo/ which contains a lot of files in some sort of directory structure. I need to keep some of them, but not all of it.

Is there a way to (in place) delete all of them except (say) 500 newest?


Solution 1:

I do this task regularly, and I use variants of the following. It is a pipeline combining various simple tools: Find all files, prepend the file modification time, sort, remove the file modification time, display all lines except the 500 first, and remove them:

find foo/ -type f | perl -wple 'printf "%12u ", (stat)[9]' | \
    sort -r | cut -c14- | tail -n +501 | \
    while read file; do rm -f -- "$file"; done

A few comments:

  • If you are using “bash”, you ought to use “read -r file”, not just “read file”.

  • Using “perl” to remove the files is faster (and also handles “weird” characters in the file names better than the while-loop does, unless you are using “read -r file”):

    ... | tail -n +501 | perl -wnle 'unlink() or warn "$_: unlink failed: $!\n"'
    
  • Some versions of “tail” do not support the “-n” option, so you must use “tail +501”. A portable way to skip the 500 first lines is

     ... | perl -wnle 'print if $. > 500' | ...
    
  • It won’t work if your file names contain newlines.

  • It is does not require GNU find.

Combining the above gives you:

find foo/ -type f | perl -wple 'printf "%12u ", (stat)[9]' | \
    sort -r | cut -c14- | perl -wnle 'print if $. > 500' | \
    perl -wnle 'unlink() or warn "$_: unlink failed: $!\n"'

Solution 2:

This is how I would do it in Python 3. which should also work for other OSs. After testing this, make sure to uncomment the line that actually removes the files.

import os,os.path
from collections import defaultdict

FILES_TO_KEEP = 500
ROOT_PATH = r'/tmp/'

tree = defaultdict(list)

# create a dictionary containing file names with their date as the key
for root, dirs, files in os.walk(ROOT_PATH):
    for name in files:
        fname = os.path.join(root,name)
        fdate = os.path.getmtime( fname )
        tree[fdate].append(fname)

# sort this dictionary by date
# locate where the newer files (that you want to keep) end
count = 0
inorder = sorted(tree.keys(),reverse=True)
for key in inorder:
    count += len(tree[key])
    if count >= FILES_TO_KEEP:
        last_key = key
        break

# now you know where the newer files end, older files begin within the dict
# act accordingly
for key in inorder:
    if key < last_key:
        for f in tree[key]:
            print("remove ", f)
            # uncomment this next line to actually remove files
            #os.remove(f)
    else:
        for f in tree[key]:
            print("keep    ", f)

Solution 3:

I don't know about the "500 newest", but with find you can delete stuff that's old than X minutes/days. Example for file and older than 2 days:

find foo/ -mtime +2 -a -type f -exec rm -fv \{\} \;

Test first with:

find foo/ -mtime +2 -a -type f -exec ls -al \{\} \;

Mind the backslashes and the space before "\;". See the find man page for more info.

Solution 4:

if you could do with keeping files x days/hours old instead of the newest x number, you could do it just with tmpwatch --ctime 7d

Solution 5:

I think the -mtime and -newer options of find command are useful for you. You can see man find for more info.