Delete oldest file in a directory, when there are more than 7 files?

I have to create a backup script (bash) of a MySQL database. When I execute the script, an sql file will be created in "/home/user/Backup". The problem is, I also have to make a script that deletes the oldest file if there are more than 7 files in ".../Backup". Does someone know how to do this? I tried everything, but it failed everytime to count the files in the directory and detect the oldest one...


Solution 1:

Introduction

Lets review the problem: the task is to check if number of files in particular directory is over certain number, and delete the oldest file among those. At first it may seem that we need to traverse the directory tree once counting files, then traverse it again to find last modification time of all files, sort them, and extract the oldest for deletion. But considering that in this particular case OP mentioned deleting files if and only if the number of files is above 7, it suggests that we can simply get list of all files with their timestamps once, and store them into a variable.

Problem with this approach is the danger associated with filenames. As has been mentioned in the comments, it's never recommended to parse ls command, since output can contain special characters and break a script. But , as some of you may know, in Unix-like systems (and Ubuntu as well), each file has inode number associated with it. Thus creating a list of entries with timestamps (in seconds for easy numeric sorting) plus inode number separated by newline will guarantee we safely parse the filenames. Deleting the oldest filename can also be done that way.

The script presented below does exactly as described above.

Script

Important: Please read the comments, especially in delete_oldest function.

#!/bin/bash
# Uncomment line below for debugging
#set -xv
delete_oldest(){
     # reads a line from stdin, extracts file inode number
     # and deletes file to which inode belongs
     # !!! VERY IMPORTANT !!!
     # The actual command to delete file is commented out.
     # Once you verify correct execution, feel free to remove
     # leading # to uncomment it
     read timestamp file_inode
     find "$directory" -type f -inum "$file_inode"  -printf "Deleted %f\n" 
     # find "$directory" -type f -inum "$file_inode"  -printf "Deleted %f\n"  -delete
}

get_files(){
    # Wrapper function around get files. Ensures we're working
    # with files and only on one specific level of directory tree
    find "$directory" -maxdepth 1 -type f  -printf "%Ts\t%i\n" 
}

filecount_above_limit(){
    # This function counts number of files obtained
    # by get_files function. Returns true if file
    # count is greater than what user specified as max
    # value 
    num_files=$(wc -l <<< "$file_inodes"  )
    if [ $num_files -gt "$max_files" ];
    then
        return 0
    else
        return 1
    fi
}

exit_error(){
    # Print error string and quit
    printf ">>> Error: %s\n"  "$1" > /dev/stderr 
    exit 1
}

main(){
    # Entry point of the program. 
    local directory=$2
    local max_files=$1

    # If directory is not given
    if [ "x$directory" == "x"  ]; then
        directory="."
    fi

    # check arguments for errors
    [ $# -lt 1  ] && exit_error "Must at least have max number of files"
    printf "%d" $max_files &>/dev/null || exit_error "Argument 1 not numeric"
    readlink -e "$directory" || exit_error "Argument 2, path doesn't exist"

    # This is where actual work is being done
    # We traverse directory once, store files into variable.
    # If number of lines (representing file count) in that variable
    # is above max value, we sort numerically the inodes and pass them
    # to delete_oldest, which removes topmost entry from the sorted list
    # of lines.
    local file_inodes=$(get_files)
    if filecount_above_limit 
    then
        printf  "@@@ File count in %s is above %d." "$directory" $max_files
        printf "Will delete oldest\n"
        sort -k1 -n <<< "$file_inodes" | delete_oldest
    else
        printf "@@@ File count in %s is below %d."  "$directory" $max_files
        printf "Exiting normally"
    fi
}

main "$@"

Usage examples

$ ./delete_oldest.sh 7 ~/bin/testdir                                                                                     
/home/xieerqi/bin/testdir
@@@ File count in /home/xieerqi/bin/testdir is below 7.Exiting normally
$ ./delete_oldest.sh 7 ~/bin                                                                                             
/home/xieerqi/bin
@@@ File count in /home/xieerqi/bin is above 7.Will delete oldest
Deleted typescript

Additional discussion

This is probably scary. . .and lengthy . . .and looks like it does far too much. And it might be. In fact, everything can be shoved onto a single line of command-line (a very much modified version of muru's suggestion posted in chat which deals with filenames. echo is used instead of rm for demonstration purposes):

find /home/xieerqi/bin/testdir/ -maxdepth 1 -type f -printf "%T@ %p\0" | sort -nz | { f=$(awk  'BEGIN{RS=" "}NR==2{print;next}'  ); echo "$f" ; }

However, I've several things I don't like about this:

  • it deletes oldest file unconditionally, without checking number of files in the directory
  • it deals with filenames directly (which required me to use awkward awk command, which will probably break with filenames that have spaces)
  • too much plumbing (too many pipes)

So, while my script looks horribly gigantic for the simple task, it does whole lot more checks and aims to solve issue with complex filenames. It probably would be shorter and more idiomatic to implement in Perl or Python ( which I absolutely can do, I just happened to choose bash for this question ).

Solution 2:

I think @Serg's answer is good and I am learning from him and from @muru. I made this answer because I wanted to explore and learn how to create a shellscript file based on the output from find with the 'action' -print to sort the files according to the time they were created/modified. Please suggest improvements and bugfixes (if necessary).

As you will notice, the programming style is very different. We can do things in many ways in linux :-)

I made a bash shell-script to match the requirements of the OP, @beginner27_, but it is not too difficult to modify it for other but similar purposes.

The following screenshot shows how it was tested: Eleven files were created, and the script (which resides in ~/bin and has execute permissions) is run. I have removed the # character from the line

# bash "$cmd"

to make it

bash "$cmd"

The first time the script discovers and prints the eleven files, the seven newest files with blue background and the four oldest files with red background. The four oldest files are removed. The script is run a second time (just for the demo). It discovers and prints the remaining seven files, and is satisfied, 'There is no backup file to remove'.

enter image description here

The crucial find command, that sorts the files according to time, looks like this,

find "$bupdir" -type f -printf "%T+ %p\0"|sort -nrz > "$srtlst"

Here is the script file. I saved it in ~/bin with the name rm_old_backups, but you can give it any name, as long as it does not interfere with some already existing name of an executable program.

#!/bin/bash

keep=7  # set the number of files to keep

# variables and temporary files

inversvid="\0033[7m"
resetvid="\0033[0m"
redback="\0033[1;37;41m"
greenback="\0033[1;37;42m"
blueback="\0033[1;37;44m"

bupdir="$HOME/Backup"
cmd=$(mktemp)
srtlst=$(mktemp)
rmlist=$(mktemp)

# output to the screen

echo -e "$inversvid$0:
keep $keep backup files, remove the oldest files, if more than $keep are found $resetvid"

echo "Security fix: You must edit this script and remove the # character from
a line near the end of the script '# bash \"\$cmd\"' --> 'bash \"\$cmd\"'
otherwise the script will only show what it can do. Please test that it
works correctly before you remove that # character!"

# the crucial find command, that sorts the files according to time

find "$bupdir" -type f -printf "%T+ %p\0"|sort -nrz > "$srtlst"

# more output

echo -e "${inversvid}time-stamp                     file-name                               $resetvid"
echo -en "$blueback"
sed -nz -e 1,"$keep"p "$srtlst" | tr '\0' '\n'
echo -en "$resetvid"

echo -en "$redback"
sed -z -e 1,"$keep"d "$srtlst" | tr '\0' '\n' | tee "$rmlist"
echo -en "$resetvid"

# remove oldest files if more files than specified are found

if test -s "$rmlist"
then
 echo rm '"'$(sed -z -e 1,"$keep"d -e 's/[^ ]* //' -e 's/$/" "/' "$srtlst")'"'\
 | sed 's/" ""/"/' > "$cmd"
 cat "$cmd"

# uncomment the following line to really remove files 
# bash "$cmd"

 echo "The oldest backup files are removed"
else
 echo "There is no old backup file to remove"
fi

# remove temporary files

rm $cmd $srtlst $rmlist