Deleting a huge 500G does not free disk space

I have a remote server here running Ubuntu (Server Edition).

Yesterday I noticed that 100% of my harddisk space was occupied. There was a log file that grew bigger and bigger, so I deleted it via rm file.foo.

Then I ran df -h but the partition where the file was stored was still hat 100% occupied.

So I thought a reboot might help and ran sudo shutdown -r now.

After waiting some minutes, I couldn't connect to the server via SSH so I asked the guys at the data center to manually restart it.

That worked and the server booted.

So I ran df -h again and now 80% of the partition is occupied (at least something).

Next, I wanted to check what requires that much disk space and ran sudo du -h --max-depth 1 / and the result was:

16K /lost+found
942M    /home
52K /tmp
4.0K    /mnt
236K    /dev
du: cannot access `/proc/17189/task/17189/fd/4': No such file or directory
du: cannot access `/proc/17189/task/17189/fdinfo/4': No such file or directory
du: cannot access `/proc/17189/fd/4': No such file or directory
du: cannot access `/proc/17189/fdinfo/4': No such file or directory
0   /proc
4.0K    /media
4.0K    /opt
4.0K    /srv
32K /root
3.0G    /var
393M    /lib
37M /boot
6.9M    /etc
681M    /usr
4.0K    /selinux
8.0M    /bin
9.0M    /sbin
4.0K    /cdrom
0   /sys
5.0G    /

As you can see in the last line, there are only 5 GB occupied (so the file cannot be in trash or "lost+found") - No way it's there anyway since I used rm command.

So, what's wrong?

My personal guess would be that while the server was restarting, it was somehow cleaning up that huge 500GB file that I removed. Forcing the manual restart probably interrupted that so it was only able to clean up 20% of that.

If my guess is true, what could I do to repair this?

If my guess is wrong, what up with my system then?


Solution 1:

My first guess would be that whatever program was writing to file.foo is still alive and holding the file handle open: The disk space is only "free" in the eyes of the kernel when the last reference to the inode (file) is cleared, and programs that have the file open count as a reference. For the future: When you move or delete a log file remember to let the program using it know - If you want to really be safe, restart the program in question.

Since you rebooted though that's theoretically impossible -- all programs should have been killed off, so any references they held would have gone away too. That leaves two possibilities I can think of:

  1. You have a hard link to the file that you don't know about.
    If this is the case, du and df should agree about the amount of space you're using on the system.

  2. Your filesystem is corrupted. Probably in the mode that an inode has a positive reference count but isn't actually pointed to by any filesystem objects.
    This is relatively easy (though time consuming) to check: On most Linux systems you can force a filesystem check on reboot by creating a file called /forcefsck (touch /forcefsck as root will do the trick) -- then just reboot and wait (a while!) while your system scans its filesystems looking for things like "lost" inodes with screwy reference counts.